Spark missing in Dataflow-runner

Hi @anton,

I have updated my playbook.json file to include the new first step.
I changed --src to reflect where my s3-loader sinks, and --dest to s3://my-stageUrl/enriched/archive/run={{nowWithFormat "2006-01-02-15-04-05"}}/.

On the first run, the new first step failed with the following error (via stderr):
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-2'

This one took me a minute. I know that my resources are in us-west-2, but I did not see us-east-1 specified explicitly in the new step.

It turns out that us-east-1 is the default region when you specify --s3Endpoint as s3.amazonaws.com.

The transformer finally succeeded after I changed my --s3Endpoint to s3-us-west-2.amazonaws.com. And by that, I mean that it moved events from the --src folder to the --dest folder.


The loader step (#3) appeared in stdout log to be successful as well. However, upon checking atomic.events in Snowflake, there were still no rows.

In the log, I saw this message:

2020-12-10T01:36:08.675Z: Launching Snowflake Loader. Fetching state from DynamoDB 2020-12-10T01:36:09.618Z: State fetched, acquiring DB connection 2020-12-10T01:36:11.619Z: DB connection acquired. Loading... 2020-12-10T01:36:12.434Z: Existing column [event_id VARCHAR(36) NOT NULL] doesn't match expected definition [event_id CHAR(36) NOT NULL UNIQUE] at position 7 2020-12-10T01:36:12.434Z: Existing column [domain_sessionidx INTEGER] doesn't match expected definition [domain_sessionidx SMALLINT] at position 17 2020-12-10T01:36:12.434Z: Existing column [geo_country VARCHAR(2)] doesn't match expected definition [geo_country CHAR(2)] at position 19 2020-12-10T01:36:12.434Z: Existing column [geo_region VARCHAR(3)] doesn't match expected definition [geo_region CHAR(3)] at position 20 2020-12-10T01:36:12.434Z: Existing column [tr_currency VARCHAR(3)] doesn't match expected definition [tr_currency CHAR(3)] at position 107 2020-12-10T01:36:12.434Z: Existing column [ti_currency VARCHAR(3)] doesn't match expected definition [ti_currency CHAR(3)] at position 111 2020-12-10T01:36:12.434Z: Existing column [base_currency VARCHAR(3)] doesn't match expected definition [base_currency CHAR(3)] at position 113 2020-12-10T01:36:12.435Z: Existing column [domain_sessionid VARCHAR(128)] doesn't match expected definition [domain_sessionid CHAR(128)] at position 121 2020-12-10T01:36:12.779Z: Warehouse snowplow_wh resumed 2020-12-10T01:36:12.790Z: Success. Exiting...

I vaguely remember reading here that the above message is not necessarily an error, but I cannot recall.

Regardless, what might be causing the Snowflake Loader step to fail somewhat silently?


Thanks again for your update – I had been trying to use the s3-loader.hocon config file to enforce the directory structure in s3, to no avail. This helped me a lot.

-Joseph

1 Like