and my guess is it has to do something with regions?
My config.yaml for buckets is as follows (mind the disclosure formatting and removing of private info) They are althog working as emr-etl runner rusn correctly and fills the buckets with data
s3:
region: ap-southeast-2
buckets:
assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
log: s3://aws-logs-121486008730-ap-southeast-2
raw:
in: # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
- s3://logs-snowplow.xxxxxx # e.g. s3://my-old-collector-bucket
# - s3://raw-in-new-snowplow.xxxx # e.g. s3://my-new-collector-bucke
processing: s3://raw-processing-snowplow.xxxxx
archive: s3://raw-snowplow.xxxxxx/archive # e.g. s3://my-archive-bucket/raw
enriched:
good: s3://enriched-snowplow.xxxxx/good # e.g. s3://my-out-bucket/enriched/good
bad: s3://enriched-snowplow.xxxxxx/bad # e.g. s3://my-out-bucket/enriched/bad
errors: s3://enriched-snowplow.xxxxxxx/errors # Leave blank unless :continue_on_unexpected_error: set to true below
archive: s3://enriched-snowplow.xxxxxx/archived # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
shredded:
good: s3://shredded-snowplow.xxxxxx/good # e.g. s3://my-out-bucket/shredded/good
bad: s3://shredded-snowplow.xxxxxxx/bad # e.g. s3://my-out-bucket/shredded/bad
errors: s3://shredded-snowplow.xxxxxx/errors # Leave blank unless :continue_on_unexpected_error: set to true below
archive: s3://shredded-snowplow.xxxxxxxx/archive # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
Actually @alex, even the 88_angkor_rc1 doesnt work. Angkor requires donwload_required option, once set to true it triggers the same error (as it calls the same function that causes error in previous release)
# Download files if required unless config[:skip].include?('download') if config[:storage][:download_required] loader::S3Tasks.download_events(config) end end
It all looks that culprit is s3.host assignation that is probably not possible in fog. Postgres load is only place where it happens in both EmrEtlRunner and StorageLoader. I’ll try to reproduce it and will let you know about findings.
It all looks that culprit is s3.host assignation that is probably not possible in fog
It seems like that as I ve found no method host exist in Fog::class (granted I ain’t ruby dev so I am not sure exactly hwo the API works). Also,yeah it only happens when storage download is added for postgres
However, I downloaded the source and removed the s3.host = region_to_safe_host(config[:aws][:s3][:region]) line; did a new build and it worked fine.(files were downloaded in assigned folder and loader performed its tasks)
FYI, I also have to add that I went back to several releases and storage-loader had same error
It just archive events but does not download to the target.
Initial reponse on running storage loader
Archiving Snowplow events...
moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
(t1) MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00001 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00001
(t3) MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00002 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00002(t2) MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00003 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00003
(t0) MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00000 -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00000
+-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00001 +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00002 +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00003 +-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/part-00000
x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00000
x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00002
x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00003
x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/part-00001
moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
(t0) MOVE xxx-bucket/enriched/good/run=2017-04-03-16-37-34/_SUCCESS -> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/_SUCCESS
+-> xxx-bucket/enriched/archive/run=2017-04-03-16-37-34/_SUCCESS
x xxx-bucket/enriched/good/run=2017-04-03-16-37-34/_SUCCESS
moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
Completed successfully
Response on later runs
Archiving Snowplow events...
moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
moving files from s3://xxx-bucket/enriched/good/ to s3://xxx-bucket/enriched/archive/
moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
moving files from s3://xxx-bucket/shredded/good/ to s3://xxx-bucket/shredded/archive/
Completed successfully
R-88 is release candidate: Snowplow rcs are designed for internal testing and often have significant changes relative to previous final releases. The config files in r-88 release has been changed significantly, so I tweaked the master branch, recompiled and ran successfully.
Do the same or else wait till they finalise the R-88 release candidate.