Iglu JVM Embedded repo on runtime?

jonas · May 4, 2022, 6:56am

Hey Team,

I’m currently trying to use the JVM embedded repo to manage my json-schemas and as I understood the documentation,
I should be able to mount the repository in my application (in runtime) under /snowplow-enricher/src/main/resources/repo/schemas/... but not matter how I configure my resolver config, the Iglu Embedded Client does not find my repo. Currently, it looks like:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Private",
        "priority": 0,
        "vendorPrefixes": ["com.some.company"],
        "connection": {
          "http": {
            "uri": "/repo"
          }
        }
      }
    ]
  }
}

Is it possible to mount a repo that way, or is it only possible to bake the repository into the JVM before even building a docker container ?

Regards
Jonas

BenB · May 4, 2022, 8:02am

Hi @jonas ,

Welcome to Snowplow community !

I don’t think that this is possible. From the documentation you linked :

As an embedded repo, there is a no mechanism for updating the schemas stored in the repository following the release of the host application.

May I ask for which application you’re trying to do that ?

If you want to test tracking, Snowplow Micro makes it possible to use schemas at runtime (thanks to this line).

istreeter · May 4, 2022, 8:35am

Actually I think this might be possible, although the documentation is currently wrong.

The trick is to put it in a path where the JVM class loader can see it. If you use the standard docker image for enrich then classes are loaded from: /home/snowplow/lib. So try putting it under:

/home/snowplow/lib/iglu-client-embedded/schemas/....

Disclaimer: I haven’t tried this yet! If it works though then we should add it to the documentation.

jonas · May 4, 2022, 9:25am

Yeah, I use snowplow/snowplow-enrich-kinesis:3.1.2
I just tried to mount the repo that way, i.e.

/home/snowplow/lib/iglu-client-embedded/schemas/com.company.test/test_event/jsonschema/...

I also tried:

/home/snowplow/lib/iglu-client-embedded/repo/schemas/com.company.test/test_event/jsonschema/...

and I tried posting the event with:

curl 'https://adress.../com.snowplowanalytics.iglu/v1'\
-H  'Content-Type: application/json; charset=UTF-8'\
--data-raw '{"schema": "iglu:com.company.test/test_event/jsonschema/1-0-0", "data": {"example_value": "test_value"}}'

But sadly, the events end up in the bad-event bucket with a resolution error, so I think It’s probably not possible this way if I didn’t miss anything.

jonas · May 4, 2022, 9:43am

Thanks for the kind welcome!
I am trying to build a server side tracking solution with snowplow. We want to run snowplow in Kubernetes, and keep the json-schema inside one git-repository.

My idea was to sync the git repository and mount the schemas directly into the pod. That way, I could just add a new schema via a merge request, and it would automatically be pulled into the enricher. This way, I wouldn’t need to build an extra static repository, but the enricher would also not need to communicate with anything other than the kinesis stream.

BenB · May 4, 2022, 10:00am

It should be possible to use directly Github as a static HTTP server holding the schemas.

For instance let’s say that your schemas are there, then you can use the raw function of Github and use https://raw.githubusercontent.com/snowplow/iglu-central/master/ as Iglu URI in your resolver.

master can be updated to any branch BTW.

jonas · May 4, 2022, 10:09am

Would that be also possible with a private repository? I forgot to mention that keeping the schema repository private was one of the reasons why I came up with the idea of syncing it directly into the Kubernetes cluster.

BenB · May 4, 2022, 10:28am

I’m afraid that it’s not, as downloading would require to provide an authorization token in the HTTP headers but that’s not possible.

In case it can help you, your schemas could just be served by a static HTTP server, e.g. with python -m SimpleHTTPServer in the folder that contains schemas/.

jonas · May 4, 2022, 11:02am

That’s a great idea! That way I should be able to quickly sync new schema while keeping things private inside the cluster.
Thanks for taking the time.

Topic		Replies	Views
Iglu static repo setup Iglu	1	1701	August 23, 2016
Snowplow Micro 1.3.1 released New releases	0	815	June 10, 2022
Snowplow Mini - Custom Schemas Snowplow Mini	10	3178	April 14, 2016
Running iglu-server (schema repo) locally for snowplow-micro Snowplow Mini	7	2791	January 17, 2025
Unable to set up Iglu repo Iglu	4	2058	May 23, 2017

Iglu JVM Embedded repo on runtime?

Related topics