My own Iglu is just not resolving

Hi Guys,

I setup my own static Iglu Repo and added the page_unload schema, I mirrored what I see on Iglu Central, same folder structure, same index.html albeit paired back with just a test schema and page_unload schema, http protocol is allowed etc.

I then updated the Iglu resolver within the enricher application:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Iglu Central - GCP Mirror",
        "priority": 1,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://mirror01.iglucentral.com"
          }
        }
      },
      {
        "name": "My Iglu Central",
        "priority": 2,
        "vendorPrefixes": [ "com.mycompany.myiglu" ],
        "connection": {
          "http": {
            "uri": "http://myiglu.mycompany.com"
          }
        }
      }    
    ]
  }
}

On the client side I used the activity callback for ping aggregation on unload, that all works perfect, I can see it posting as a self describing event, has the min/max of X/Y and active seconds etc so all good here.

This issue I’m getting is specifically the unload self describing event is not making it into the good stream, everything else passes to good and makes it to SnowFlake. But not this specific event that uses my own Iglu.

Within the bad S3 I’m getting violations:

{"schema":"iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-0",

"data":{"processor":{"artifact":"stream-enrich","version":"1.1.0"},"failure"

:{"timestamp":"2020-11-02T14:55:28.555302Z","messages":[{"schemaKey":"iglu:com.mycompany.myiglu/page_unload/jsonschema/1-0-0","error":

{"error":"ResolutionError","lookupHistory":[{"repository":"My Iglu Central","errors":

[{"error":"NotFound"}],"attempts":20,"lastAttempt":"2020-11-02T14:55:28.295Z"},{"repository":"Iglu Client Embedded","errors":

[{"error":"NotFound"}],"attempts":1,"lastAttempt":"2020-10-21T10:52:43.044Z"},{"repository":"Iglu Central","errors":

[{"error":"NotFound"}],"attempts":20,"lastAttempt":"2020-11-02T14:55:28.471Z"},{"repository":"Iglu Central - GCP Mirror","errors":

[{"error":"NotFound"}],"attempts":20,"lastAttempt":"2020-11-02T14:55:28.554Z"}]}}]},"payload":{"enriched": {"app_id": etc...(The post data)

My domain is; //company.myiglu.com, I doubt that makes any difference?
What I mean by this is ://companyiglu.com versus aforementioned domain structure.

Maybe I missed something?

Any assistance is appreciated, I’ve been starting at this one for a while now.
Kyle

So the error you see means that either the Iglu repo itself is unreachable, or it hasn’t been able to find the schema.

So first thing to check is whether the url is available - which you can do using a curl request.You mention you’re using static registry, which I assume means it’s an S3 bucket - the settings on this bucket must allow public access. If the problem seems to be here then that’s what to look for first.

Next is to check the path of the schema. The convention is that the self portion of the schema must match the path of the schema, so schemas/{vendor}/{name}/{format}/{version}. Check out the /schemas directory of the iglu central repo for lots of examples.

Since you mention an index.html file, I think you may have looked at what appears in the browser for iglucentral.com for guidance - which isn’t a bad idea, but actually this sitewas created as an afterthought. The workflow that might be easier to manage is to store schemas in a GitHub repo (same structure as the one I linked above), and use igluctl to perform uploads to repo.

A final note - while static S3 based repo is probably a quicker setup in the short term, a full Iglu Server is the best option, because the latest loaders require it (and the older loaders require a lot of manual steps in the workflow).

2 Likes

As always I very much appreciate the detailed reply @Colm

Ah! I think I see how I’ve gone wrong, it’s definitely with the self path you outlined here:

Quick question, solely out of curiosity to complete the picture in my head, taking iglucentral as an example the folder /schemas/

As in:

http://iglucentral.com/schemas/

What or where dictates /schemas/? within the resolver.json in the enricher app the domain uri http://iglucentral.com is set here, but what about the subfolder /schemas/ is this something innate to the .jar?

Happy to help!

What or where dictates /schemas/? within the resolver.json in the enricher app the domain uri http://iglucentral.com is set here, but what about the subfolder /schemas/ is this something innate to the .jar?

Actually I’m not sure. Instinct tells me that it must be in the Iglu scala cleint somewhere.

I don’t believe it’s something that applies to Iglu Server - which stores schemas in a Postgres instance. So the concept of a directory structure doesn’t quite apply. (Although my gut says that it’s replaced by a table structure. However the need to think about it is removed, hence my ignorance on this one :smiley: )

When you think of /schemas/ as a folder path it feels a bit arbitrary. The better way to think of it is of an Iglu schema registry as a somewhat-RESTful API, where /schemas/ is identifying a specific resource type. This namespacing becomes important as Iglu schema registries get other resource types, e.g. mappings between different schema types.

1 Like

Thanks @alex, that makes sense in my minds eye.

@Colm Thanks again, page_unload with ping aggregations works like a charm now after I matched the path correctly.

One thing I spotted setting this all up is the blog post outlined here needs corrected. Within the enableActivityTrackingCallback snippet the maxXOffset is actually shown in the example as maxYOffset which leads to 2 instances of maxYOffset.

Good spot. I’ll get that sorted!