Set up a iglu repo in github/gitlab

mpeychet · October 4, 2019, 2:39pm

Hi everyone,

We want to start using custom contexts and unstructured events. For that, we want to set up a iglu repo. We are thinking of putting in gitlab public project.

We have been reading this thread: GitHub - snowplow/iglu: Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow

And were wondering: do we really need to follow the guidelines that I found on the page above ?

To host your static schema registry, follow the AWS guide, Host a Static Website on Amazon Web Services.

Couldn’t we use the raw functionality of gitlab ? Schema will be accessible at a specific url, which could look like this one:

https://raw.githubusercontent.com/snowplow/iglu-example-schema-registry/master/schemas/com.example_company/example_event/jsonschema/1-0-0

ihor · October 4, 2019, 4:12pm

@mpeychet, when it comes to static Iglu the main point is to have the JSON schemas publicly available over HTTP request. It shouldn’t matter where it is hosted as long as it is accessible from the infrastructure you run your enrichment/shredding/loading process on. From your example, you would need to specify https://raw.githubusercontent.com/snowplow/iglu-example-schema-registry/master as the uri property for your custom Iglu server in the Iglu resolver configuration file.

mpeychet · October 4, 2019, 4:34pm

Yes ! cool that is exactly what I was thinking! Thanks for confirming this @ihor

Colm · October 4, 2019, 4:37pm

Just to add - the main consideration to take into account is that the endpoint needs to be able to handle a high volume of requests for the pipeline to scale reliably. I’d hazard a guess that this is why we specify S3 in the docs.

I don’t know much about the gitlab functionality you’re referring to so not sure if it’s relevant here, but flagging just in case.

mpeychet · October 7, 2019, 1:25pm

Do you if the schema is loaded for each event ? Or is it cached somehow ?

josh · October 7, 2019, 1:31pm

Hey @mpeychet - it is cached according to the Iglu Resolver cache rules that you define - namely:

“cacheSize”: XXX,
“cacheTtl”: XXX,

If you remove the TTL all together the server will cache the schema forever technically.

Topic		Replies	Views
Setting up Iglu Iglu	12	4840	October 20, 2017
Iglu static repo setup Iglu	1	1708	August 23, 2016
Documentation for custom context Iglu	2	2905	April 26, 2017
Writing Iglu clients Iglu	9	2081	May 7, 2018
The real need for Iglu Server against static Iglu repository For engineers	4	79	October 2, 2024

Set up a iglu repo in github/gitlab

Related topics