403 Error in static Iglu registry

My iglu_resolver.json:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Iglu Cartful",
        "priority": 5,
        "vendorPrefixes": [ "com.cartful" ],
        "connection": {
          "http": {
            "uri": "http://iglu.cartfulsolutions.com"
          }
        }
      }
    ]
  }
}

My call to the unstructured event (base64 intentionally disabled) as recorded by my browser:

ue_pr:{
  "schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
  "data": {
    "schema": "iglu:com.cartful/oa_event/jsonschema/1-0-0",
    "data": {
      "oae": "plugin:page:step:0",
      "oad": {}
    }
  }
}

Here is my static iglu repo hosted at http://iglu.cartfulsolutions.com :

<html><head><title>Iglu Cartful</title></head>
<body>
 <h2>/</h2>
 <ul>
  <li><a href="/schemas/com.cartful/oa_event/jsonschema/1-0-0">schemas/com.cartful/oa_event/jsonschema/1-0-0</a></li>
 </ul>
</body></html> 

Here is the contents of my schema located at http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0 :

{
	"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
	"description": "Schema for a legacy OA style analytics event",
	"self": {
		"vendor": "com.cartful",
		"name": "oa_event",
		"format": "jsonschema",
		"version": "1-0-0"
	},

	"type": "object",
	"properties": {
		"oae": {
			"type": "string"
		},
		"oad": {
			"type": "string"
		}
	},
	"required": ["oae"],
	"additionalProperties": false
}

Unfortunately no matter what I try, I’m can’t get my custom event to work. The logs seem to complain of 403 (forbidden) but everything in my bucket is publicly accessible and can be accessed via unauthenticated curl. Here are the logs:

level: \"error\"\n repositories: [\"Iglu Central [HTTP]\",\"Iglu Client Embedded [embedded]\",\"Iglu Cartful [HTTP]\"]\n"},{"level":"error","message":"error: Unexpected exception fetching iglu:com.cartful/oa_event/jsonschema/1-0-0 in HTTP Iglu repository Iglu Cartful: java.io.IOException: Server returned HTTP response code: 403 for URL: http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0\n level: \"error\"\n"},{"level":"error","message":"error: Unexpected exception fetching iglu:com.cartful/oa_event/jsonschema/1-0-0 in HTTP Iglu repository Iglu Cartful: java.io.IOException: Server returned HTTP response code: 403 for URL: http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0\n level: \"error\"\n"},{"level":"error","message":"error: Unexpected exception fetching iglu:com.cartful/oa_event/jsonschema/1-0-0 in HTTP Iglu repository Iglu Cartful: java.io.IOException: Server returned HTTP response code: 403 for URL: http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0\n level: \"error\"\n"}]

I have no idea why this wouldn’t work. Any thoughts? I also tried to embed the schema locally (using jvm-embedded as a guide) but that also didn’t work.

Hello @darren,

That is extremely weird issue. Your Iglu Registry denies HTTP requests without User Agent header, and Iglu client doesn’t set it. Did you set up registry using S3 Static Hosting or are you using different kind of webserver (that would help us to decide whether we should set UA in client).

I created corresponding ticket: https://github.com/snowplow/iglu-scala-client/issues/58

1 Like

Hi @anton,

Thank you so much for the quick response to my problem.

Ultimately I am backed by S3, but I think I might know the issue: CloudFlare. I will disable their WAF functionality and try it again (should resolve directly to the S3 CNAME).

I will try again and get back to you!

Hey again @anton - so I ran a few fresh tests again, and I’m still not sure why I’m getting 403. Here is a curl without a User-Agent:

curl -H "User-Agent:" -v http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0
*   Trying 54.231.49.225...
* Connected to iglu.cartfulsolutions.com (54.231.49.225) port 80 (#0)
> GET /schemas/com.cartful/oa_event/jsonschema/1-0-0 HTTP/1.1
> Host: iglu.cartfulsolutions.com
> Accept: */*
>
< HTTP/1.1 200 OK
< x-amz-id-2: GN1hsr0RFtl68VpcwagVgxjFvVaLgxIovpZRx0oFf5GLS7rN6yDVS2/bxx9qivRbi/PDLXms7hY=
< x-amz-request-id: F4F749EFC0BDB3FB
< Date: Thu, 17 Nov 2016 01:56:07 GMT
< Last-Modified: Wed, 16 Nov 2016 16:46:16 GMT
< ETag: "62bca1b27ea03f6e58dca2b2843cd088"
< Content-Type: application/octet-stream
< Content-Length: 447
< Server: AmazonS3
<
{
	"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
	"description": "Schema for a legacy OA style analytics event",
	"self": {
		"vendor": "com.cartful",
		"name": "oa_event",
		"format": "jsonschema",
		"version": "1-0-0"
	},

	"type": "object",
	"properties": {
		"oae": {
			"type": "string"
		},
		"oad": {
			"type": "string"
		}
	},
	"required": ["oae"],
	"additionalProperties": false
}
* Connection #0 to host iglu.cartfulsolutions.com left intact

This points directly to the S3 host (no longer being router through Cloudflare):

# dig iglu.cartfulsolutions.com +short
iglu.cartfulsolutions.com.s3-website-us-east-1.amazonaws.com.
s3-website-us-east-1.amazonaws.com.
54.231.82.225

I’m willing to give the embedded repo a shot but also happy to help troubleshoot this issue. I’m still kind of perplexed.

Hey @darren,

I’m not sure, but I think your curl example still sends an UA header (it is empty, but it’s there). And iglu client still has same issue. You can reproduce it in Scala this way.

import java.net.URL 

val url = new URL("http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0")
val connection = url.openConnection
// connection.setRequestProperty("User-Agent", "Iglu-client")  // this fixes issue
connection.getInputStream

UPD: No, I’m wrong. curl doesn’t set UA in your example. I’ll try to figure out what’s happening.

@anton - If there’s anything I can do to assist in testing, let me know. I’m trying the embedded route in the mean time.

Thanks for your help!

Hey @darren, I think there’s something wrong was done in S3 setup and I’d really like to find out what exactly.

Especially taking in account that it works now! Did you change something recently in setup? May be some part of WAF configuration was cached?

@anton - Well, I can’t say that it works yet… My EMR process still says 403. Let me run it one more time (remove embedded) and say for sure. Sorry about that.

My S3 setup is pretty standard… I the the ACL to public-read on all the objects recursively.

That’s strange because I cannot reproduce it anymore using same code.

sigh

"errors":[{"level":"error","message":"error: Could not find schema with key iglu:com.cartful/oa_event/jsonschema/1-0-0 in any repository, tried:\n level: \"error\"\n repositories: [\"Iglu Central [HTTP]\",\"Iglu Client Embedded [embedded]\",\"Iglu Cartful [HTTP]\"]\n"},{"level":"error","message":"error: Unexpected exception fetching iglu:com.cartful/oa_event/jsonschema/1-0-0 in HTTP Iglu repository Iglu Cartful: java.io.IOException: Server returned HTTP response code: 403 for URL: http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0\n level: \"error\"\n"},{"level":"error","message":"error: Unexpected exception fetching iglu:com.cartful/oa_event/jsonschema/1-0-0 in HTTP Iglu repository Iglu Cartful: java.io.IOException: Server returned HTTP response code: 403 for URL: http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0\n level: \"error\"\n"},{"level":"error","message":"error: Unexpected exception fetching iglu:com.cartful/oa_event/jsonschema/1-0-0 in HTTP Iglu repository Iglu Cartful: java.io.IOException: Server returned HTTP response code: 403 for URL: http://iglu.cartfulsolutions.com/schemas/com.cartful/oa_event/jsonschema/1-0-0\n level: \"error\"\n"}]

I’ll just try changing the URI to S3’s DNS at this point

@anton I think I’m all set here! Now I’m getting validation errors on the schema!

Thanks once again for helping me out here!

Glad to hear, @darren (about issue disappeared, not about validation errors). If you’ll have any idea what exactly went wrong - please post it.

@anton - I believe it was CloudFlare being overzealous in protecting against attacks that appeared to be automated (no UA set). I was never able to replicate the 403 myself but that’s my best theory.