Error: Cannot decode enrichments -ResolutionError

Sreenath · August 4, 2022, 7:40am

Hello everyone
I am doing a POC in snowplow. In single ec2 I had deployed collector,enricher,s3loader
Initially I was using iglu central repo in enricher config, now I tried to change repo to s3 for this I had changed s3 to be publicly accessible and enables Static website hosting in particular bucket.
Collector and loader application is working fine when i tried to run enricher application following error occurs

Error
[pool-1-thread-2] INFO com.snowplowanalytics.snowplow.enrich.common.fs2.config.ParsedConfigs - Parsed Iglu Client with following registries: AWS S3 Schema Repository
[pool-1-thread-2] ERROR com.snowplowanalytics.snowplow.enrich.common.fs2.Run - CLI arguments valid but some of the configuration is not correct. Error: Cannot decode enrichments {“error”:“ResolutionError”,“lookupHistory”:[{“repository”:“AWS S3 Schema Repository”,“errors”:[{“error”:“NotFound”}],“attempts”:1,“lastAttempt”:“2022-08-04T07:14:40.627Z”},{“repository”:“Iglu Client Embedded”,“errors”:[{“error”:“NotFound”}],“attempts”:1,“lastAttempt”:“2022-08-04T07:14:40.665Z”}]}

Collector Config config.kinesis.hocon

collector {
  interface = "ec2-private-ip"
  port = 8080
  
  paths {
    "/com.acme/track"    = "/com.snowplowanalytics.snowplow/tp2"
  }
  
  doNotTrackCookie {
    enabled = false
    #enabled = ${?COLLECTOR_DO_NOT_TRACK_COOKIE_ENABLED}
    # name = {{doNotTrackCookieName}}
    name = collector-do-not-track-cookie
    # value = {{doNotTrackCookieValue}}
    value = collector-do-not-track-cookie-value
  }
 

  streams {
    good = "kinesis-poc"
    bad = "kinesis-poc"
    sink {
      enabled = "kinesis"
      threadPoolSize = 10
      region = "us-east-1"
      aws {
        accessKey = "iam"
        secretKey = "iam"
      }

      backoffPolicy {
        minBackoff = 3000
        maxBackoff = 600000
      }
    }

    buffer {
      byteLimit = 3145728
      recordLimit = 500
      timeLimit = 5000
    }
   }
}


akka {
  loglevel = WARNING
  loggers = ["akka.event.slf4j.Slf4jLogger"]

  http.server {
    remote-address-header = on
    raw-request-uri-header = on

    parsing {
      max-uri-length = 32768
      uri-parsing-mode = relaxed
      illegal-header-warnings = off
    }

    max-connections = 2048
  }

  coordinated-shutdown {
    run-by-jvm-shutdown-hook = off
  }
}

resolver.json

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2",
  "data": {
    "cacheSize": 500,
    "cacheTtl": 60,
    "repositories": [
      {
        "name": "AWS S3 Schema Repository",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://cs-poc.s3.amazonaws.com"
          }
        }
      }
    ]
  }
}

enricher.config.hocon

{
  "input": {
    "streamName": "kinesis-poc"
    "region": "us-east-1"
  }

  "output": {
    "good": {
      "streamName": "cs-poc-enriched-events-stream"
    }

    "bad": {
      "streamName": "cs-poc-enriched-events-stream"
    }
  }
}

S3loader config

{
  # Optional, but recommended
  "region": "us-east-1",

  # Options are: RAW, ENRICHED_EVENTS, JSON
  # RAW simply sinks data 1:1
  # ENRICHED_EVENTS work with monitoring.statsd to report metrics (identical to RAW otherwise)
  # SELF_DESCRIBING partitions self-describing data (such as JSON) by its schema
  "purpose": "ENRICHED_EVENTS",

  # Input Stream config
  "input": {
    # Kinesis Client Lib app name (corresponds to DynamoDB table name)
    "appName": "acme-s3-loader",
    # Kinesis stream name
    "streamName": "cs-poc-enriched-events-stream",
    # Options are: LATEST, TRIM_HORIZON, AT_TIMESTAMP
    "position": "LATEST",
    # Max batch size to pull from Kinesis
    "maxRecords": 10
  },

  "output": {
    "s3": {
      # Full path to output data
      "path": "s3://cs_poc_out/cs_demo_output/",

      # Partitioning format; Optional
      # Valid substitutions are {vendor}, {schema}, {format}, {model} for self-describing jsons
      # and {yy}, {mm}, {dd}, {hh} for year, month, day, hour
      #partitionFormat: "date={yy}-{mm}-{dd}"

      # Prefix for all file names; Optional
      "filenamePrefix": "raw_data",

      # Maximum Timeout that the application is allowed to fail for, e.g. in case of S3 outage
      "maxTimeout": 2000,
      # Output format; Options: GZIP, LZO
      "compression": "GZIP"
    },

    # Kinesis Stream to output failures
    "bad": {
      "streamName": "cs-poc-enriched-events-stream"
    }
  },

  # Flush control. A first limit the KCL worker hits will trigger flushing
  "buffer": {
    # Maximum bytes to read before flushing
    "byteLimit": 2048,
    # Maximum records to read before flushing
    "recordLimit": 10,
    # Maximum time between flushes
    "timeLimit": 5000
  }
}

Note: For the sake of poc I had used same stream for both good and bad streams

PaulBoocock · August 4, 2022, 8:44am

Hi @Sreenath

Couple of things to point out, you should keep Iglu Central in your iglu resolver, as all the core schemas for Snowplow are available there. You could of course copy them all over to your static repository but you wouldn’t get any updates should we publish new schemas or update existing ones.

Are your schemas in a /schemas directory? You’ll notice on Iglu Central, the path looks like the following: iglucentral.com/schemas/com.snowplowanalytics.snowplow/ad_click/jsonschema/1-0-0

Lastly, you’ll need to ensure your ec2 instance can reach your S3 bucket. I imagine it can but always worth verifying if you’re having connection/resolution issues.

Sreenath · August 4, 2022, 9:33am

Hi @PaulBoocock thanks for the reply

When I added both repo in enricher config enricher application works

enricher.config.hocon

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2",
  "data": {
    "cacheSize": 500,
    "cacheTtl": 60,
    "repositories": [
        {
        "name": "Iglu Central",
        "priority": 1,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "AWS S3 Schema Repository",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://cs-poc.s3.amazonaws.com"
          }
        }
      }
    ]
  }
}

But I don’t know which repo is used by enricher with this configuration.

S3 path : s3://cs-poc/schemas/com.snowplowanalytics.snowplow/ad_click/jsonschema/1-0-0

Also tried wget command from ec2 to fetch object in s3 repo directory that is also working fine

Please tell me if there are any risk while going in production if I choose central repo, my concern regarding availability of central repo whenever enricher requests

PaulBoocock · August 4, 2022, 9:59am

Iglu Central is actually an S3 bucket under the hood, so should give good availability. The priority and vendorPrefixes settings define which repo gets queried first, so you will get look ups to http://cs-poc.s3.amazonaws.com as your main priority with the set up you have above.

The components also cache the schemas, so you won’t be hitting the repositories that often.

Sreenath · August 4, 2022, 10:16am

@PaulBoocock
Can I place custom schema in static repo and check whether the application is accessing the same. Is that the only way to find which of the repo is accessed by enricher.

What confuse me is then why does it doesn’t work at the first place when I had given only static repo in enricher config

Topic		Replies	Views
[ERROR] Updating Snowplow Enricher - ResolutionError Enrichment	5	1160	August 5, 2022
Enricher lost connection to S3 buckets. Can't read iglu schemas	4	68	December 13, 2024
Snowplow schema violation error for custom schema Troubleshooting	9	1441	August 1, 2022
Iglu "ResolutionError" in the v1.0.0 RDB Shredder AWS real-time pipeline	6	2138	May 19, 2021
SOLVED - Issue after upgrading enrich-pubsub from 2.0.6 to 3.0.3	2	941	April 19, 2022

Error: Cannot decode enrichments -ResolutionError

Related topics