In the configuration for snowplow-emr-etl-runner, version r95, should the value of collectors.format be “tsv/com.amazon.aws.cloudfront/wd_access_log”, or just “cloudfront”?
The former is what I find in the docs (https://github.com/snowplow/snowplow/blob/master/3-enrich/emr-etl-runner/config/config.yml.sample), the latter is what seems to work.
If I use “tsv/com.amazon.aws.cloudfront/wd_access_log”, then atomic.events.app_id is always null.
Context: I am working on replacing a two-year-old deployment of Snowplow (don’t know the version number). To test the R95 installation, I copied the S3 buckets (specified in config.yml, aws.s3.buckets.raw.in) to new buckets. In each Cloudfront logfile entry, the query string is the 12th field, and the app_id value is in parameter “aid”.