How do I update Iglu Resolver to reference personal Iglu Server? (needed for RDB shredder)

Hi everyone! I’m having a little trouble updating my iglu_resolver.json file after deploying my own iglu server.

For background I use Redshift in my pipeline, so I moved away from Iglu Central in favor of my own Iglu Server to enable the RDB Shredder to function properly. Previously, the RDB Shredder would complete its run, but all events would be placed in a folder called “output=bad”, and in a subfolder called “name=loader_iglu_error.” I researched this and came to understand (per this documentation) that the RDB Shredder has required an Iglu Server since R32.

I set up a database for this purpose via RDS and successfully deployed iglu-server-0.6.0.jar on its own EC2 instance. I then used the following snippet (pulled from same documentation linked above) to mirror Iglu Central to my server:

$ git clone https://github.com/snowplow/iglu-central.git
$ igluctl static push iglu-central/schemas $YOUR_SERVER_URL $YOU_API_KEY
$ igluctl static push com.acme-iglu-registry/schemas $YOUR_SERVER_URL $YOU_API_KEY

This appears to have worked successfully. Next, I updated the standard iglu_resolver.json template file to now point to my server:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "{{my ec2 public DNS}}:8080",
			"apikey": "{{my super API key}}"
          }
        }
      }
    ]
  }
}

But when I re-run the RDB shredder using the new resolver file, I’m met with this error:

21/05/20 18:26:16 ERROR Client: Application diagnostics message: User class threw exception: java.lang.RuntimeException: RDB Shredder could not fetch iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0 schema at initialization. Schema cannot be resolved in following repositories:
* Iglu Central due [NotFound] after 1 attempt
* Iglu Client Embedded due [NotFound] after 1 attempt

It seems there’s either something wrong with my resolver file, my Iglu server implementation, or both. Any thoughts would be a huge help! Thanks in advance.

So the issue you’re having appears to be down to networking - at least that’s what my gut says.

I notice that you specify port 8080 in your resolver, which isn’t something I typically see - I’m not sure what the effect of that would be but it’s worth trying without it. Also do check that the uri points to the correct api endpoint for the Iglu server.

It may also be down to your network permissions policy for either the Iglu Server or the RDB shred job - one way to test this is to ssh into the instance you’re running the shred job from, and see if you can reach the Iglu server. It could be IAM permissions or network rules if that appears to be the problem.

Aside from that specific problem, I would like to point out that you can list more than one iglu in your resolver file. So typically you wouldn’t need to copy all the schemas from Iglu Central to your own instance of Iglu, but instead you’d host your custom schemas in your one, and the standard ones live in Iglu Central. This way, any new versions of our standard schemas are automatically available to you, whereas the other way you would need to copy them across before using new features which leverage them.

Sometimes the use case does call for hosting the standard schemas yourself - in that case you can still have both but set the priority field to prefer your server over Iglu Central - so new schemas are still available, but the resolver will look in your Iglu instance, and will only check Iglu Central when it can’t find the schema.

1 Like

I appreciate the reply, @Colm.

I specified port 8080 since that’s the port I identify in the repo-server section of the Iglu Server application.conf file:

repo-server {
  interface = "0.0.0.0"
  port = 8080
  pool = "cached"
}

Is that not needed? If not, what’s the expected method for hosting the Iglu Server? Right now the uri just points to port 8080 of the EC2 instance’s public DNS address. How would I change that to point to the correct API endpoint as you recommend? Would I structure my uri like so HOST/api/schemas/ ?

I was able to SSH into the EC2 instance of the Iglu Server from the Instance i’m using to run RDB Shredder, so I don’t think it’s a permissions issue. To facilitate ease of setup I have all IAM roles, security groups, etc. set wide open at the moment.

Can you expand on which use cases would require a user self-hosting the stand schemas? Per this thread the other day, I thought my use case would require a self-hosted Iglu Server. Thanks again for the help!

Ah, I see, thanks for linking that thread - I hadn’t thought of the fact that RDB loader requires an Iglu Server…

I think this is a wrinkle in the process for Open Source setup that we’ll look to fix - it has been raised internally. For now, you’re right, you do need the custom Iglu server, and to push the Iglu central schemas.

Is that not needed? If not, what’s the expected method for hosting the Iglu Server? Right now the uri just points to port 8080 of the EC2 instance’s public DNS address. How would I change that to point to the correct API endpoint as you recommend? Would I structure my uri like so HOST/api/schemas/ ?

So I think I used the wrong language here, when I said ‘api endpoint’ what I really mean is just ‘endpoint’. Apologies that might have thrown you.

I’ve also realised that it’s not actually as easy as I assumed to be sure that you have the right endpoint - looks like this is something our documentation doesn’t make very clear, and actually I’m digging for a conclusive answer.

I can tell you though that all of the instances of a resolver that I have found don’t specify the port, just the http address. So I think that the Iglu client figures out what to ping to get the schemas – but take this with a slight pinch of salt as I’ve not validated that assumption.

It is worth a try to not specify the port, (and reboot enrich after it’s updated to make sure it doesn’t return a cached error). In the meantime, I’ll see if I can find a conclusive answer.

Thanks for looking in to this @Colm, I look forward to hearing what you find.

In the meantime, I did eliminate the port from URI and that appears to have had an effect. Now when I attempt to run the shredder, i’m met by this error:

21/05/21 17:11:23 ERROR Client: Application diagnostics message: User class threw exception: java.lang.RuntimeException: RDB Shredder could not fetch iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0 schema at initialization. Schema cannot be resolved in following repositories:
* Iglu Central due [Iglu Repository Failure. Connection refused (Connection refused)] after 1 attempt
* Iglu Client Embedded due [NotFound] after 1 attempt

Previously the “Iglu Central” central line also said [NotFound]. But since removing the port from the URI, it now says [Iglu Repository Failure. Connection refused (Connection refused)]. That said, i’m a little confused about what aspects of my configuration are causing the connection to be refused. The instance my Iglu Server is running on is in a public subnet, and it’s security group is set as follows:

The same subnets and security group are used for the RDS database where the Iglu Central schemas are mirrored. And that database has been configured such that Public accessibility = Yes.

Are there any other aspects of the configuration that I might be overlooking, which could cause the connection to the Iglu Server to be refused? Thanks again for all your help!

@Colm just an update, I was able to resolve this issue. The solution was to input the URI as follows:

"uri": "{{public DNS address where Iglu Server is running}}:{{port}}/api/",

Thanks again for your assistance!

Good to hear you resolved it! I must admit I think my knowledge is out of date here, my apologies if my input confused the issue. (I confused myself at least!)

Sorry for the trouble @samurijv2 .

We’ve added a reminder in the docs:

2021-05-28-084210_608x94_scrot

Thank you for pointing that.

1 Like