November 29, 2016, 5:12am
We have set up the regular snowplow batch pipeline with a collector on ElasticBeanstalk.
The business has a need to identify certain metrics from our webserver access logs (not the collector). Example: How many requests came from GoogleBot?
Does snowplow supports a way to process existing Apache or NGINX log files that didn’t come through the collector?
November 29, 2016, 11:35am
@estahn - it’s a nice idea but it’s not currently supported.
We support processing CloudFront access logs, but not Apache or Nginx logs.
November 29, 2016, 11:48am
I imagine it’s possible to convert nginx/Apache logs to a pseudo CloudFront format and parse that? (with placeholder values for fields that nginx/Apache doesn’t have).
November 29, 2016, 12:11pm
Apache or Nginx was just an example for a custom log format. In fact we
collect cloudfront logs from all our domains. My understanding was that
those have to come through a special pixel. Is this not the case? Can I
process random cloudfront logs with snowplow?
November 29, 2016, 11:14pm
@estahn - yes, it’s a bit hard to find in the documentation but you can process CloudFront access logs in the Snowplow batch pipeline by setting this input format in the