Trouble about Non-breaking schema change with Postgres as a Data Warehouse

Hi Snowplow team,

I am currently deploying Snowplow with a Postgres loader using self-describing events. I encountered a problem when trying to push a non-breaking schema update (adding optional fields to the schema). The schema appears to be pushed to the schema repository, but the payload with the new schema is not ingested into Postgres.

Here are the steps I’ve taken so far:

  1. I have schema version 1-0-0 with the structure below:
{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for a content context",
  "self": {
    "vendor": "com.mycompany",
    "name": "myschema",
    "format": "jsonschema",
    "version": "1-0-0"
  },
  "type": "object",
  "properties": {
    "content_type": {
      "description": "A content type",
      "type": "string",
      "minLength": 0,
      "maxLength": 4096
    },
    "item_id": {
      "description": "An item id",
      "type": "string",
      "minLength": 0,
      "maxLength": 4096
    },
    "item_name": {
      "description": "An item name",
      "type": ["string", "null"],
      "minLength": 0,
      "maxLength": 4096
    }
  },
  "required": ["content_type", "item_id"],
  "additionalProperties": false
}
  1. I tested sending a payload with the above schema, and it worked correctly with the data loaded into Postgres and schema_version value in db shown 1-0-0.
  2. Then I changed the schema to version 1-0-1 (adding the optional field “status”) with the structure below:
{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for a content context",
  "self": {
    "vendor": "com.mycompany",
    "name": "myschema",
    "format": "jsonschema",
    "version": "1-0-1"
  },
  "type": "object",
  "properties": {
    "content_type": {
      "description": "A content type",
      "type": "string",
      "minLength": 0,
      "maxLength": 4096
    },
    "item_id": {
      "description": "An item id",
      "type": "string",
      "minLength": 0,
      "maxLength": 4096
    },
    "item_name": {
      "description": "An item name",
      "type": ["string", "null"],
      "minLength": 0,
      "maxLength": 4096
    },
    "status": {
        "description": "A status of item",
        "type": ["string", "null"],
        "minLength": 0,
        "maxLength": 4096
    }
  },
  "required": ["content_type", "item_id"],
  "additionalProperties": false
}
  1. Sending a payload with schema version 1-0-1 included data with every field from #3, but the data is not loaded into Postgres and is not sent to the bad stream either.

Note: I tested by sending a payload that referred to schema version 1-0-1 but used the data structure of version 1-0-0. It seems to work correctly, with the data loaded into Postgres and schema_version value in db shown 1-0-1.

Do you have an example of the payload you are sending against the 1-0-1 schema?

Can you see the event going into enriched after you send it in? (as a prerequisite to getting loaded to Postgres).

@mike

Here is my test payload using python tracker.

test_payload = SelfDescribing(
    SelfDescribingJson(
        "iglu:com.mycompany/myschema/jsonschema/1-0-1",
        {
            "content_type": "test-content-type",
            "item_id": "test-item-id",
            "item_name": "test-item-name",
            "status": "test-status"
        },
    )
)
tracker.track(test_payload)

And this is data from

SELECT * FROM com.mycompany_myschema_1

Payload using version 1-0-0 and 1-0-1 without sending status

I tried to monitor whether the event in the Python code above is being enriched or not. Any idea where I should start looking?

For this you’ll need to check the enriched stream (this may be Kinesis or something similar depending on what you are running for your stream) which will serve as the input to the loader.

What is the comment on this table? Does it have the status column?

Thank you @mike, I will check the enriched stream.

A comment on the table is

comment on table com.mycompany_myschema_1 is ‘iglu:com.mycompany/myschema/jsonschema/1-0-1’;

However, no status column has been created.

This suggests that the table has been evolved but was likely evolved while 1-0-1 did not have a status column against that schema. I’d test by adding a new column to 1-0-2 and making sure that the column is evolved and then starts receiving that data correctly.