Hi all,
I am using EMR ETL + Redhsift + Clojure Collector and when I look in the Beanstalk S3 log bucket and look at the logs, I’m seeing what I expect which are pageviews from our site which we tagged up.
Yet when I run the EMR ETL + Storage Loader not everything is getting into Redshift despite the whole process running without error. Only data from from a dev site we tagged is coming in.
So a few questions:
In my config.yml, under monitoring:snowplow:app_id I have “snowplow” but I have not set an app_id in the Javascript tracker - could this be why?
This is how I initialised the tracker (I left the options blank):
;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];
p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)
};p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;
n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script","//d1fc8wv8zag5ca.cloudfront.net/2.5.1/sp.js","snowplow"));
window.snowplow('newTracker', 'mycljcoll', 'sp.a.appliancesonline.com.au', {
// Initialise a tracker
// I left this blank with no options..
});
window.snowplow('trackPageView');
I also had some general questions about the Clojure Collector + EMR ETL.
- How does it keep track of what the last time it was ran?
- Do logs going all the way back to when the collector started get stored on the Beanstalk S3 bucket? Or does the EMR ETL tool wipe them after it runs?
- The documentation says to run a daily crontab, but I noticed on the Beanstalk S3 bucket that logs get rotated hourly. The EMR ETL + Storage Loader process takes under 30 mins - could I run this hourly?
Thanks,
Tim