EmrEtlRunner Config / environment variables not recognized

dadasami · January 28, 2021, 2:32pm

If I understood you correctly and as we only need to deploy the shredder, then out of (EmrEtlRunner + Shredder + Loader) we ignore EmrEtlRunner and the Loader and create a playbook and a config file to submit our shredder spark job via dataflow-runner to AWS EMR. For this we will require these 4 files:

playbook.json
emr-config.json
iglu-resolver.json {base64}
config.hocon {base64}

In this case, where can we find a sample for config.hocon required for the shredder (“s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-0.19.0”)?

From this dataflow-runner tutorial, I assume this would be how we submit our shredder spark job:

$ ./path/to/dataflow-runner run-transient --emr-config ./path/to/emr-config.json --emr-playbook ./path/to/playbook.json

Please correct me if I am wrong.

Topic		Replies	Views
Value guarded in: Snowplow::EmrEtlRunner::Cli::load_config Enrichment	8	1552	January 11, 2020
Storage target credentials Storage targets	5	1503	January 17, 2018
EmrEtlRunner Not working Enrichment	0	1208	March 29, 2018
Running r89 build.sh failed Enrichment	3	1350	February 27, 2018
With Contract: Maybe, String, Bool => Maybe At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:211 Enrichment	6	1046	February 6, 2020

EmrEtlRunner Config / environment variables not recognized

Related topics