Solved: Enrich Bad error - Access log TSV line contained X fields, expected Y


It took nearly 2 days to figure this out so thought I better share this in case it helps others.

When we were running the Enrich process, all of the events/logs were ending up in the bad bucket with the following error:

“Access log TSV line contained 33 fields, expected 12, 15, 18, 19, 23, 24 or 26”

We managed to figure out what was causing this.

In our config.yml file we were using

  format: tsv/ # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events

Changing “tsv/” to instead just be “cloudfront” fixed the issue. Now nearly all of our events end up in the /archive bucket.

We are using JS 2.12 and EmrEtlRunner R119.

The config.yml.sample comments should be updated to reflect this.


@Ryan_Newsome, the ...cloudfront/wd_access_log format is meant for what it states - Cloudfront logs processing, not the events you track on your application.

Do bear in mind Cloudfront collector is deprecated. You might wish to migrate to Scala Stream collector instead.

1 Like

Thanks @ihor that makes more sense. I must have just got too excited when I saw the word ‘cloudfront’ in the script comment :laughing:

Will also look to migrate to Scala stream collector after we get all our cloudfront events in to Snowflake.