Hi everyone! I’m nearly finished setting up my AWS real-time snowplow pipeline but I’ve hit a snag with the RDB loader. It seems to be able to read from the SQS FIFO queue okay, but for some reason it fails to load the S3 folder where all my shredded events are located (see final line of the output below):
INFO 2021-05-25 17:38:22.190: RDB Loader 1.0.1 [SP-Redshift] has started. Listening {{my sqs fifo queue name}}
INFO 2021-05-25 17:51:45.970: Received new message. Total 1 messages received, 0 loaded, 0 attempts has been made to load current folder
INFO 2021-05-25 17:51:45.976: New data discovery at run=2021-05-25-17-40-53 with following shredded types:
* iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-*-* TSV
* iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-*-* TSV
* iglu:org.w3/PerformanceTiming/jsonschema/1-*-* TSV
* iglu:org.ietf/http_client_hints/jsonschema/1-*-* TSV
* iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-*-* TSV
* iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-*-* TSV
* iglu:nl.basjes/yauaa_context/jsonschema/1-*-* TSV
* iglu:com.google.analytics/cookies/jsonschema/1-*-* TSV
* iglu:org.ietf/http_cookie/jsonschema/1-*-* TSV
INFO 2021-05-25 17:51:46.613: Creating atomic.com_snowplowanalytics_snowplow_web_page_1 table for iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0
INFO 2021-05-25 17:51:46.665: Table created
INFO 2021-05-25 17:51:46.808: Creating atomic.org_w3_performance_timing_1 table for iglu:org.w3/PerformanceTiming/jsonschema/1-0-0
INFO 2021-05-25 17:51:46.848: Table created
INFO 2021-05-25 17:51:46.895: Creating atomic.org_ietf_http_client_hints_1 table for iglu:org.ietf/http_client_hints/jsonschema/1-0-0
INFO 2021-05-25 17:51:46.921: Table created
INFO 2021-05-25 17:51:46.982: Creating atomic.com_snowplowanalytics_snowplow_link_click_1 table for iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1
INFO 2021-05-25 17:51:47.006: Table created
INFO 2021-05-25 17:51:47.039: Creating atomic.com_snowplowanalytics_snowplow_ua_parser_context_1 table for iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0
INFO 2021-05-25 17:51:47.063: Table created
INFO 2021-05-25 17:51:47.232: Creating atomic.nl_basjes_yauaa_context_1 table for iglu:nl.basjes/yauaa_context/jsonschema/1-0-2
INFO 2021-05-25 17:51:47.262: Table created
INFO 2021-05-25 17:51:47.291: Creating atomic.com_google_analytics_cookies_1 table for iglu:com.google.analytics/cookies/jsonschema/1-0-0
INFO 2021-05-25 17:51:47.312: Table created
INFO 2021-05-25 17:51:47.337: Creating atomic.org_ietf_http_cookie_1 table for iglu:org.ietf/http_cookie/jsonschema/1-0-0
INFO 2021-05-25 17:51:47.359: Table created
INFO 2021-05-25 17:51:47.360: Loading s3://{{my-s3-bucket}}/archive/shredded/run=2021-05-25-17-40-53/
INFO 2021-05-25 17:51:47.371: COPY atomic.events
INFO 2021-05-25 17:51:47.638: Could not load a folder (base s3://{{my-s3-bucket}}/archive/shredded/run=2021-05-25-17-40-53/), trying to ack the SQS command
ERROR 2021-05-25 17:51:47.672: Loading is shutting down with failure. null
This is confusing to me, because clearly the shredder can access that S3 folder without any problems when I execute the dataflow runner job. The roleArn I reference in my config hocon grants full access to S3 and Redshift, but I wonder if there are perhaps other permissions that I’m missing?
Any thoughts on what could be causing this issue would be greatly appreciated! Thanks a lot.