Important: Changes to Iglu Central's api for schema lists

istreeter · October 6, 2021, 2:25pm

This is a good news announcement, but we strongly recommend you read the section on how it could affect your Snowplow pipeline.

Starting from 1st November 2021, Iglu Central will support the schema list endpoints that were previously only supported by a full Iglu Server

Example 1 List all known schemas (currently returns a 404):

curl http://iglucentral.com/api/schemas

Example 2 List all versions of the ad_click schema (currently returns a 404):

http://iglucentral.com/api/schemas/com.acme/ad_click/jsonschema/1

Why is this a good thing?

Snowplow’s RDB loader and Postgres loader use the schema list endpoints in order to discover gather all available schema patches and revisions, and therefore to create table columns with the correct types.

Until now, those loaders required the user to run an Iglu server, because that was the only style of Iglu repository that supported the list endpoints. Iglu Central, on the other hand, is a static Iglu repository, which means it is just a bunch of json files stored on S3.

In order to use Iglu Central schemas, the user had to manually upload those schemas to their own privately-run Iglu Server. With this new change, users can run the RDB loader and Postgres loader with an iglu resolver that simply uses the http://iglucentral.com repository instead of self-hosting those schemas.

How is this change possible?

Igluctl version 0.8.0 added a feature to upload schema lists as static json files to S3. We will start using this feature during our iglu-central deployments.

How will this affect my Snowplow pipeline?

Important: Your pipeline could start resolving schemas against Iglu Central where previously it was resolving against your private Iglu server.

For example, if your Iglu resolver lists Iglu Central with a higher priority than your private Iglu server:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 1,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Custom Iglu Server",
        "priority": 100,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://myiglu.example.com"
          }
        }
      }
    ]
  }
}

Previously, your RDB loader would have tried Iglu Central first for the schema list, and would have received a 404 Not Found response. RDB loader would then have tried your custom Iglu Server and received a 200 Success response.

Starting from 1st November 2021, Iglu Central will respond with a 200 Success response for a Iglu Central schema, and the request will never go to your custom Iglu.

This is good news for most pipelines. However, if you have in any way modified, adapted or extended the Iglu Central schemas in your private repo then it could affect the visiblity of those changes to your loader. Such schema modifications are highly discouraged and generally not necessary.

If you are concerned you might have modified the Iglu Central schemas in your private repo, we suggest to set the priorities in your resolver config so that your private repo has a higher precedence than Iglu Central.

Topic		Replies	Views
Iglu Central now supports the Snowplow RDB loader and Postgres loader Announcements	0	701	November 2, 2021
The real need for Iglu Server against static Iglu repository For engineers	4	75	October 2, 2024
[Solved] Posting a schema to Iglu Server: The schema is not found For engineers	4	915	September 24, 2021
Writing Iglu clients Iglu	9	2080	May 7, 2018
Iglu Server 0.8.0 released New releases	0	930	February 10, 2022

Important: Changes to Iglu Central's api for schema lists

Why is this a good thing?

How is this change possible?

How will this affect my Snowplow pipeline?

Related topics