Because of some latency we had the necessity to try a secondary collector in a different region.
Primary runs on eu-west-1, the secondary runs on ap-southeast-1.
I’ve gathered the logs from both regions to an intermediate bucket on the same location as the EMR.
During that process I had to rename the files (see below) so they don’t get overwrited, because I’ve got the exact same log files both from ap-southeast-1 and eu-west-1 region.
Can you give us the configured in buckets and their structure from your last message?
It seems to me like you have only one bucket?
Also, starting from R91, emr etl runner doesn’t do any renaming of the clojure log files so as long as you have different filenames initially you should be fine. I’m assuming you’re using an earlier version since it seems that the timestamps in the filenames have changed format between raw and processing. I’d advise updating to R92 directly.
Until now, I’ve only used the eu-west-1 bucket and everything worked fine.
But since we need a few more collectors I’ve then tried to use IN buckets from multi-regions but as @alex said it won’t work. So, I’ve created an intermediate bucket were I put all the logs for processing (tracking-snapshots/events/snowplow-raw) gathered at the locations commented above.
I’ve renamed the log files since I have _var_log_tomcat8_rotated_localhost_access_log.txt1506423661.gz for both locations ap-southeast-1 and eu-west-1.
If you did upgrade you could just copy the e-vbm84x2rwb ap-southeast-1 directory to s3n://elasticbeanstalk-eu-west-1-602232737466/resources/environments/logs/publish/ and have: