Iglu JSON caching

alexopoulos7 · February 2, 2018, 11:39am

Hello,
First-time Snowplow/Iglu user here.

My main goal is to setup Snowplow for production use inside company. As I am playing around I was thinking of using s3 as a static repo for Iglu Repository. I am very interested to build a robust and highly available and cheap(!) solution.

I am wondering if every incoming event is validated by the Iglu Schema Validator. Does this mean that the load of the static server will be proportional to incoming data? Do we do any kind of caching?

I am not very familiar with Scala, but I have searched Enricher code and Iglu client repository trying to find if we do some caching. Haven’t found anything.

Do you think that this affects performance? Should we use like a loadbalancer or is it overkill?

Please share your experience and insights of Iglu load requirements.

Thank you.

anton · February 2, 2018, 12:11pm

Hey @alexopoulos7,

Sure thing, Scala Iglu client uses cache! Its size can be configured by cacheSize setting in resolver configuration and TTL by cacheTtl. Under the hood this is LRU cache, which I believe is most efficient approach here.

Most likely, your registry will receive as many HTTP requests as many schemas you have in dataset (plus few auxiliary schemas * number of nodes), which is usually very small amount, so I don’t think this can be a real performance concern.

mike · February 2, 2018, 12:18pm

@anton beat me to it!

If you’re interested in the logic of the LRU cache it lives here.

alperakgun · October 18, 2019, 8:11am

Hi everyone:
Hi @mike - checking the source code:

Is cacheTtl value in seconds? for 1hour : “cacheTtl”: 3600?
is cacheSize just the number of schemas like for 1000 schemas “cacheSize”: 1000? or something else?

Could anyone point to the source code of where this is used?

mike · October 18, 2019, 12:00pm

Yes - this in seconds.

Yes - the LRU cache stores according to a key composed of vendor, name, format and full Iglu schema version. The code for parts of the resolver can be found here.

alperakgun · October 18, 2019, 12:15pm

Awesome . Thanks @mike

cc @mpeychet

Topic		Replies	Views
Iglu Scala Client 0.5.0 released New releases	3	1071	February 9, 2017
The real need for Iglu Server against static Iglu repository For engineers	4	61	October 2, 2024
Iglu Scala Client 2.2.0 and 1.3.0 released Announcements	0	661	November 23, 2022
Set up a iglu repo in github/gitlab Iglu	5	1598	October 7, 2019
Writing Iglu clients Iglu	9	2061	May 7, 2018

Iglu JSON caching

Related topics