Data passing Enrichment (and RDBLoader) despite ResolutionError (Schemas Repo not found)

Hi,

we updated our cloudfront distributions lately that access our schema repo on s3. However, it seems that enrichment cannot find the repo anymore.
The new cloudfront urls do not have access to buckets. We changed recently from OAI to OAC and probably that configuration is not 100% correct yet.

Hoewever, we were wondering why the data is still passing both enrichment and the rdbloader and is loaded into redshift if the resolver is not resolving the schema uri properly and basically not working effectively? Any ideas?

enrichments - 
{
    "error": "ResolutionError",
    "lookupHistory": [
        {
            "repository": "Iglu Client Embedded",
            "errors": [
                {
                    "error": "NotFound"
                }
            ],
            "attempts": 1,
            "lastAttempt": "2023-04-05T08:02:50.227Z"
        },
        {
            "repository": "S3-schemas-registry",
            "errors": [
                {
                    "error": "NotFound"
                }
            ],
            "attempts": 1,
            "lastAttempt": "2023-04-05T08:02:51.438Z"
        }
    ]
}
[pool-1-thread-2] ERROR com.snowplowanalytics.snowplow.enrich.common.fs2.Run - CLI arguments 

The bucket policy of our schemas repo looks currently like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity [OLD CLOUDFRONT ORIGIN]"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::snowplow-schemas-repo/*"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudfront.amazonaws.com"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::snowplow-schemas-repo/*",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "aws:SourceArn": "[NEW CLOUDFRONT ORIGIN]"
                }
            }
        }
    ]
}

Using the FULL ARN fixed the access issue but we are still very confused why the data passed the enrichment step with a missing resolver configuration.

Well, at least some of the data didn’t pass validation, otherwise you wouldn’t have a resolution error to post.

A few things to explain that might help you wrap your head around this though:

  1. There is caching involved. So even if a schema’s not available, it won’t fail immediately. I don’t think that’s the explanation here but worth noting.
  2. Standard events (page views etc) get validated against schemas that are hosted publicly, separately from your schema registry. So only your custom data would fail in this scenario.

I don’t know much about the cloudfront side of things but potentially there’s something on that side involved too.

1 Like

Hi @Colm ,

thanks a lot for your reply.

to 1. that is unlikely because we were using the wrong cloudfront url for several days.

to 2. also data that is based on custom schemas passed our pipeline all the way to redshift.

We fixed the cloudfront url. but the mystery remains. Maybe it was using an old cloudfront url in an outdated image…

We will let you know once we have figured out the reason.

Best,

M.