Keep getting Data validity : Could not fetch schema

Hello all, we have scala collector and enrichment running. We set up a static S3 repo for our unstructured events at http://schemas.tripshock.com.s3-website-us-east-1.amazonaws.com
Here’s our resolver config that is set up on the enrichment instance:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 1000,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Tripshock Schemas",
        "priority": 5,
        "vendorPrefixes": [ "com.tripshock" ],
        "connection": {
          "http": {
            "uri": "http://schemas.tripshock.com.s3-website-us-east-1.amazonaws.com"
          }
        }
      }
    ]
  }
}

When we run our test scrip we keep getting:

Schema :

iglu:com.tripshock/create_case/jsonschema/1-0-0

Data validity : Could not fetch schema

caseDescription : This is test case creation description

create_case is our custom event that we are trying to capture. What might be the problem?

Hi @alexb,

Your resolver seems to be valid as I can access your schema at expected URI: http://schemas.tripshock.com.s3-website-us-east-1.amazonaws.com/schemas/com.tripshock/create_case/jsonschema/1-0-0

I’d also recommend you to change schema’s metadata vendor to com.tripshock, but I don’t think it can cause any problems. Where are you getting this message from “Data validity : Could not fetch schema”? I don’t remember any Snowplow components producing it.

Another potential problem is that your schema has been added too late and Iglu Resolver cached its absense. To rule out this scenario, I’d recommend you to restart your enrichment and/or loader.

Here’s the test script I’m using: http://schemas.tripshock.com/index-test.html
I’m then examining the events using the Snowplow Analytics Debugger in Chrome. Here’s the screenshot. I did restart both collector and enricher but it didn’t make a difference. Is making a static repo and then pointing the resolver to it is all that’s needed? Is there anything else that needs to be configured? It’s my understanding is that the iglu client is part of the collector. Is that correct?

Sorry, I never used Snowplow Analytics Debugger and I suspect it doesn’t support custom schemas - I think it basically knows only about Iglu Central schemas and everything else won’t be found. And no, Collector also knows nothing about any schemas, the first schema-aware component is Enrich.

I think Snowplow Inspector supports custom schemas though (can you confirm @mike?)

I meant that the iglu client is part of the enricher not the collector. What’s the best way to make sure unstructured events are being properly recognized then?

Since 1.0.0, enrich does periodcal pulling of schemas with “not found” response, so if you’re using latest assets you should be good.

Before 1.0.0 however, the only two ways were either to make sure the first event with the schema hits after you uploaded the schema or to reload enrich if you’ve deployed tracking before uploading the schema.

Your issue I believe is about Snowplow Analytics Debugger, not Enrich.

There are two ways to do this:

  1. Use the Snowplow Inspector that @anton has mentioned which does support custom schemas (note that this is an open source project rather than a Snowplow one though)
  2. Test sending your events to an instance of Snowplow Mini which will use stream-enrich to validate and you can visualise this data in Kibana.

Thanks Mike. We’ve done a good share of testing with Mini and used Kibana to analyze the events. Now, we are setting up the real SnowPlow and that’s the only thing that didn’t seem to work.

So, I have created Kinesis Firehose delivery streams for good and bad events and am now seeing the create_case in our bad events stream. This means that our repo that servers our custom schema is not being recognized by our enricher. Is that correct? Here’s what I’m seeing:

{“schema”:“iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-0”,“data”:{“processor”:{“artifact”:“stream-enrich”,“version”:“1.1.3”},“failure”:{“timestamp”:“2020-10-22T22:52:39.386840Z”,“messages”:[{“schemaKey”:“iglu:com.tripshock/create_case/jsonschema/1-0-0”,“error”:{“error”:“ValidationError”,“schemaIssues”:[{“path”:"$",“message”:“Unknown Metaschema: http://schemas.tripshock.com/schemas/com.tripshock/create_case/jsonschema/1-0-0"}]}}]},“payload”:{“enriched”:{“app_id”:“sample-app”,“platform”:“web”,“etl_tstamp”:"2020-10-22 22:52:39.365”,“collector_tstamp”:“2020-10-22 22:52:37.601”,“dvce_created_tstamp”:“2020-10-22 22:52:36.192”,“event”:“unstruct”,“event_id”:“7915ac8a-0079-411b-b54d-2b39c2ba83c0”,“txn_id”:null,“name_tracker”:“sp”,“v_tracker”:“js-2.10.2”,“v_collector”:“ssc-0.17.0-kinesis”,“v_etl”:“stream-enrich-1.1.3-common-1.1.3”

Can you share your schema and how you are sending the create case event? It looks like a http URI whereas typically you would expect it to be prefixed with iglu: as the protocol.

Here it is:
{
“$schema”: “http://schemas.tripshock.com/schemas/com.tripshock/create_case/jsonschema/1-0-0”,
“description”: “Schema for Create Case event”,
“self”: {
“vendor”: “com.company”,
“name”: “create_case”,
“format”: “jsonschema”,
“version”: “1-0-0”
},
“type”: “object”,
“properties”: {
“caseDescription”: {
“type”: “string”
}
},
“required”: [“caseDescription”],
“additionalProperties”: false
}

@alexb, your metaschema is wrong. You have "$schema": "http://schemas.tripshock.com/schemas/com.tripshock/product/jsonschema/1-0-0" but should have "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#" instead.

1 Like

@ihor I changed the “$schema”: “http://schemas.tripshock.com/schemas/com.tripshock/create_case/jsonschema/1-0-0" to “http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#” as you advised and I’m now seeing some stuff in good events! {“schema”:“iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0”,“data”:{“schema”:“iglu:com.tripshock/create_case/jsonschema/1-0-0”,“data”:{“caseDescription”:“This is test case creation description”}}} I guess it’s working now?

Looks like it :slight_smile:

Thank you much Ihor!!