Iglu Central now supports the Snowplow RDB loader and Postgres loader

On 1st November 2021, Iglu Central added support the schema list endpoints that were previously only supported by a full Iglu Server

Example 1 List all known schemas

curl http://iglucentral.com/schemas

Example 2 List all versions of the ad_click schema:

curl http://iglucentral.com/schemas/com.snowplowanalytics.snowplow/ad_click/jsonschema/1

Why is this a good thing?

Snowplow’s RDB loader and Postgres loader use the schema list endpoints in order to discover all available schema patches and revisions, and therefore to create table columns with the correct types.

Until now, those loaders required the user to run an Iglu server, because that was the only style of Iglu repository that supported the list endpoints. Iglu Central, on the other hand, is a static Iglu repository, which means it is just a bunch of json files stored on S3.

In order to use Iglu Central schemas, the user had to manually upload those schemas to their own privately-run Iglu Server. With this new change, users can run the RDB loader and Postgres loader with an Iglu resolver that simply uses the http://iglucentral.com repository instead of self-hosting those schemas.

Example configuration

Here is a typical Iglu resolver that you might use to configure your Snowplow pipeline applications.

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 1,
        "vendorPrefixes": [ ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Private Iglu Server",
        "priority": 100,
        "vendorPrefixes": [ "com.acme" ],
        "connection": {
          "http": {
            "uri": "http://myiglu.example.com/api",
            "apikey": "00000000-0000-0000-0000-000000000000"
          }
        }
      }
    ]
  }
}

In this resolver, Iglu Central has highest priority (1), and contains all the standard schemas. The private Iglu server has a lower priority (100) and can host just the pipeline owner’s own custom schemas. The Iglu client will query Iglu Central first to look for a schema, and will only fall back to private Iglu Server if it is not found in Iglu Central. Our docs site has a more detailed description of the resolution algorithm

3 Likes