Stream enrich is trying to validate SelfDescribingJson instead of the specified one

I have recently been trying to add some extra validation to my pipeline using Iglu and unstruc_events through to the stream enricher.

It seems that the iglu client within the stream enricher is trying to regex validate the self-describing json schema instead of the one I have specified, yet when I try to send the event through my tracker without the SelfDescribingJson() it fails the test that it is a SelfDescribingJson, what should I do?


  • enrich error
  • tracking code
  • intended schema
  • resolver.json

<-------------- ENRICH ERROR EXAMPLE --------------->
“line”: “{lots of base 64 line}”,
“errors”: [
“level”: “error”,
“message”: "error: ECMA 262 regex “^iglu:[a-zA-Z0-9-.]+/[a-zA-Z0-9-]+/[a-zA-Z0-9-_]+/[0-9]±[0-9]±[0-9]+\" does not match input string \"\"\n level: \"error\"\n schema: {\"loadingURI\":\"#\",\"pointer\":\"/properties/schema\"}\n instance: {\"pointer\":\"/schema\"}\n domain: \"validation\"\n keyword: \"pattern\"\n regex: \"^iglu:[a-zA-Z0-9-_.]+/[a-zA-Z0-9-_]+/[a-zA-Z0-9-_]+/[0-9]+-[0-9]+-[0-9]+”\n string: ""\n"
“level”: “error”,
“message”: “Unstructured event couldn’t be extracted”
“failure_tstamp”: “2016-11-15T12:53:07.633Z”
<-------------- END ENRICH ERROR EXAMPLE --------------->

This is my python code to send the event

<-------------- TRACKER CODE --------------->
s = Subject()

event = SelfDescribingJson(schema=“iglu:com.busuu/standard_event/jsonschema/1-0-1”,
“event”: {event_name},
“uid”: {uid},
“language_learnt”: {language_learnt},
“interface_language”: {interface_language},
“params”: {custom_context},
“platform”: {platform},
“app_id”: {app_id},
“version”: {version},
“environment”: {environment},
“user_agent”: {user_agent}})


<-------------- END TRACKER CODE --------------->

this is the schema that I am trying to validate against

<-------------- SCHEMA VALIDATOR CODE --------------->
"$schema": “”,
“description”: "Schema for the busuu ",
“self”: {
“vendor”: “com.busuu”,
“name”: “standard_event”,
“format”: “jsonschema”,
“version”: “1-0-0”

    "type": "object",
    "properties": {
            "event": {
                    "type": "string",
                    "maxLength": 255
            "uid": {
                    "type": "string",
                    "maxLength": 255
            "ts": {
                    "type": "string",
                    "maxLength": 255
            "language_learnt": {
                    "type": "string",
                    "maxLength": 255
            "interface_language": {
                    "type": "string",
                    "maxLength": 255
            "params": {
                    "type": "string",
                    "maxLength": 500
            "platform": {
                    "type": "string",
                    "maxLength": 255
            "app_id": {
                    "type": "string",
                    "maxLength": 255
            "version": {
                    "type": "string",
                    "maxLength": 255
            "environment": {
                    "type": "string",
                    "maxLength": 255
            "user_agent": {
                    "type": "string",
                    "maxLength": 255
    "additionalProperties": false

<-------------- END SCHEMA VALIDATOR CODE --------------->

and finally my resolver.json

<-------------- RESOLVER CODE --------------->

“schema”: “iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1”,
“data”: {
“cacheSize”: 500,
“repositories”: [
“name”: “Iglu Central”,
“priority”: 0,
“vendorPrefixes”: [ “com.snowplowanalytics” ],
“connection”: {
“http”: {
“uri”: “
“name”: “busuu Iglu Repo”,
“priority”: 5,
“vendorPrefixes”: [ “com.busuu” ],
“connection”: {
“http”: {
“uri”: “{ip of my resolver}”
<-------------- END RESOLVER CODE --------------->

Hi @brucey31 - this is very odd:

I feel like somewhere in your code you must have self-describing JSONs with:

  "schema": "",
  "data": {

However I fully concede that this problem isn’t present in the code you shared.

Thanks for such a quick reply Alex,

You are definitely right that the iglu central schema is being used unnecessarily.
When I decode the raw data line of the event straight out of the collector and before it hits the enricher I get:

{“data”: {“data”: {“language_learnt”: “{language_learnt}”, “platform”: “mob”, “version”: “{version}”, “params”: “{‘source’: ‘{source}’, ‘term’: ‘{term}’, ‘group’: ‘{group}’, ‘email’: ‘{email}’, ‘campaign’: ‘{campaign}’}”, “uid”: “{uid}”, “user_agent”: “{user_agent}”, “environment”: “{environment}”, “interface_language”: “{interface_language}”, “event”: “{event}”, “app_id”: “{app_id}”}, “schema”: “iglu:com.busuu/standard_event/jsonschema/1-0-1”}, “schema”: “iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0”}

This looks pretty good to me? The only place that I mention “” in my code is at the top of my custom schema see SCHEMA VALIDATOR CODE.

Is there something wrong in my resolver.json that points the validator to the wrong place?

Hey @brucey31:

I am using whatever the python tracker uses.

I changed the iglu central schamas and it worked!

Thanks for your help!

Very odd!