Set up a iglu repo in github/gitlab

Hi everyone,

We want to start using custom contexts and unstructured events. For that, we want to set up a iglu repo. We are thinking of putting in gitlab public project.

We have been reading this thread: GitHub - snowplow/iglu: Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow

And were wondering: do we really need to follow the guidelines that I found on the page above ?

To host your static schema registry, follow the AWS guide, Host a Static Website on Amazon Web Services.

Couldn’t we use the raw functionality of gitlab ? Schema will be accessible at a specific url, which could look like this one:

https://raw.githubusercontent.com/snowplow/iglu-example-schema-registry/master/schemas/com.example_company/example_event/jsonschema/1-0-0

@mpeychet, when it comes to static Iglu the main point is to have the JSON schemas publicly available over HTTP request. It shouldn’t matter where it is hosted as long as it is accessible from the infrastructure you run your enrichment/shredding/loading process on. From your example, you would need to specify https://raw.githubusercontent.com/snowplow/iglu-example-schema-registry/master as the uri property for your custom Iglu server in the Iglu resolver configuration file.

1 Like

Yes ! cool that is exactly what I was thinking! Thanks for confirming this @ihor

Just to add - the main consideration to take into account is that the endpoint needs to be able to handle a high volume of requests for the pipeline to scale reliably. I’d hazard a guess that this is why we specify S3 in the docs.

I don’t know much about the gitlab functionality you’re referring to so not sure if it’s relevant here, but flagging just in case. :slight_smile:

Do you if the schema is loaded for each event ? Or is it cached somehow ?

Hey @mpeychet - it is cached according to the Iglu Resolver cache rules that you define - namely:

“cacheSize”: XXX,
“cacheTtl”: XXX,

If you remove the TTL all together the server will cache the schema forever technically.