Currently, we are deploying Shredder and RDB Loader on two separate Fargate instances. The enriched data gets shredded with “shredding_complete.json” in the root folders and SQS sends messages to the RDB Loader, however we get the following error in the RDB Loader fargate task logs:
Now, we are getting the next error, which says “java.sql.SQLException: Amazon Invalid operation: schema “atomic” does not exist;”. I believe this comes from the config.hocon file:
We are using self-describing json schemas. Does that mean the atomic schema type has to be changes?
The atomic schema and the relevant tables need to be created before you can load into them.
In the relevant setup guide, there are some pre-defined scripts to do this. The atomic schema is created here. It might be helpful to double check that the other steps in the guide have been done too.
Oh and to explain this part - there are two different uses of the word ‘schema’ involved - the schema in your error refers to a Redshift database schema.
JSONschema is a standard for defining the structure of events, the two aren’t related. The pipeline uses JSONschemas to determine validity of data and also to create table definitions for the database (sometimes called a ‘table schema’ just to make sure it’s as confusing as possible ).
I tried running “psql -h sp-redshift-cluster-***.cziwymixltdg.eu-west-1.redshift.amazonaws.com -U sp_user -d sp_redshift_database -p 5439 -W -f atomic-def.sql”, from my local machine, but psql is unable to establish a connection to the redshift cluster with the following error:
psql: error: could not connect to server: Operation timed out
Is the server running on host "sp-redshift-cluster-gitc-dev.cziwymixltdg.eu-west-1.redshift.amazonaws.com" (11.222.59.12) and accepting
TCP/IP connections on port 5439?
I tried to apply the instructions here, but unfortunately the documentation is somewhat outdated and I could not find my way through the new AWS console. I made the security group of the redshift fully permissive and I added the second inbound rule limited to my machine IP and still it did not work. Am I missing a point?
Once again, thanks a lot @mike Our Redshift was not public, and now we are able to create the tables. However, we ran into another issue with the RDB Loader:
INFO 2021-03-05 09:28:35.807: RDB Loader [***] has started. Listening sp-sqs-queue-***.fifo
INFO 2021-03-05 10:08:59.504: Received new message. Total 1 messages received, 0 loaded, 0 attempts has been made to load current folder
INFO 2021-03-05 10:09:00.105: New data discovery at run=2021-03-05-09-40-05 with following shredded types:
* iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-*-* TSV
* iglu:com.***/generic_tracking_event/jsonschema/1-*-* (s3://snowplow-schemas-***/jsonpaths/com.***/generic_tracking_event_1.json)
* iglu:com.***/minimal_tracking_event/jsonschema/1-*-* (s3://snowplow-schemas-***/jsonpaths/com.***/minimal_tracking_event_1.json)
RDB Loader unknown error in executeUpdate
java.lang.NullPointerException
at com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.$anonfun$executeUpdate$1(JDBC.scala:142)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.executeUpdate(JDBC.scala:144)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at liftF @ com.snowplowanalytics.iglu.client.resolver.Resolver$.$anonfun$parse$3(Resolver.scala:240)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.loading.Load$.load(Load.scala:68)
at *> @ com.snowplowanalytics.snowplow.rdbloader.Main$.$anonfun$run$1(Main.scala:43)
at use @ com.snowplowanalytics.snowplow.rdbloader.Main$.$anonfun$process$1(Main.scala:75)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$.getConnection(JDBC.scala:104)
at flatMap @ retry.package$RetryingOnSomeErrorsPartiallyApplied.$anonfun$apply$3(package.scala:97)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at make @ com.snowplowanalytics.snowplow.rdbloader.dsl.Environment.makeBusy(Environment.scala:43)
RDB Loader unknown error in executeUpdate
java.lang.NullPointerException
at com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.$anonfun$executeUpdate$1(JDBC.scala:142)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.executeUpdate(JDBC.scala:144)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at map @ com.snowplowanalytics.iglu.client.Client$.parseDefault(Client.scala:56)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at map @ com.snowplowanalytics.iglu.client.Client$.parseDefault(Client.scala:56)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at *> @ com.snowplowanalytics.snowplow.rdbloader.dsl.Logging$$anon$1.$anonfun$info$2(Logging.scala:86)
at void @ fs2.concurrent.PubSub$Subscriber.complete(PubSub.scala:104)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.executeUpdate(JDBC.scala:144)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at liftF @ com.snowplowanalytics.iglu.client.resolver.Resolver$.$anonfun$parse$3(Resolver.scala:240)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
java.lang.NullPointerException
at com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$.$anonfun$interpreter$4(JDBC.scala:84)
at *> @ com.snowplowanalytics.snowplow.rdbloader.dsl.Logging$$anon$1.$anonfun$info$2(Logging.scala:86)
at void @ fs2.concurrent.PubSub$Subscriber.complete(PubSub.scala:104)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.executeUpdate(JDBC.scala:144)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at map @ com.snowplowanalytics.iglu.client.Client$.parseDefault(Client.scala:56)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at map @ com.snowplowanalytics.iglu.client.Client$.parseDefault(Client.scala:56)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at *> @ com.snowplowanalytics.snowplow.rdbloader.dsl.Logging$$anon$1.$anonfun$info$2(Logging.scala:86)
at void @ fs2.concurrent.PubSub$Subscriber.complete(PubSub.scala:104)
at flatMap @ com.snowplowanalytics.snowplow.rdbloader.dsl.JDBC$$anon$1.executeUpdate(JDBC.scala:144)
at flatMap @ com.snowplowanalytics.iglu.client.resolver.Resolver$.parse(Resolver.scala:237)
at liftF @ com.snowplowanalytics.iglu.client.resolver.Resolver$.$anonfun$parse$3(Resolver.scala:240)
ERROR 2021-03-05 10:09:01.702: Fatal failure during message processing (base s3://sp-shredded-***/good/run=2021-03-05-09-40-05/), message hasn't been ack'ed. Data loading error java.lang.NullPointerException
Can you double-check your host? Specifically that it doesn’t contain something like jdbc: or :5439. The only thing that can return null there as far as I can see (but it’s hard to reverse engineer as Redshift driver doesn’t come with sources) is connect method when URI is not supported.
Thanks @anton. The issue disappeared after updating the Redshift and RDB Loader roles and policies. It might have happened due to a lack of permissions.