No errors, exitCode 0 in the controller,
here’s syslog:
2017-02-27 16:16:16,756 INFO cascading.flow.hadoop.util.HadoopUtil (main): resolving application jar from found main method on: com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$
2017-02-27 16:16:16,757 INFO cascading.flow.hadoop.planner.HadoopPlanner (main): using application jar: /mnt/var/lib/hadoop/steps/s-24FWFV1LAAFIK/snowplow-hadoop-enrich-1.8.0.jar
2017-02-27 16:16:16,765 INFO cascading.property.AppProps (main): using app.id: 427E8338DE404CB8BC20B216FF29B12A
2017-02-27 16:16:16,830 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2017-02-27 16:16:16,968 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2017-02-27 16:16:17,575 INFO com.amazon.ws.emr.hadoop.fs.EmrFileSystem (main): Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2017-02-27 16:16:17,711 INFO amazon.emr.metrics.MetricsSaver (main): MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1488211973639
2017-02-27 16:16:17,711 INFO amazon.emr.metrics.MetricsSaver (main): Created MetricsSaver j-FFWC39O27Z3S:i-04fa8471dbd4d00a3:RunJar:08029 period:60 /mnt/var/em/raw/i-04fa8471dbd4d00a3_20170227_RunJar_08029_raw.bin
2017-02-27 16:16:18,164 INFO cascading.util.Version (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Concurrent, Inc - Cascading 2.6.0
2017-02-27 16:16:18,166 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] starting
2017-02-27 16:16:18,166 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] source: Hfs[“LzoByteArrayScheme[[UNKNOWN]->[ALL]]”][“hdfs:/local/snowplow/raw-events”]
2017-02-27 16:16:18,166 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] sink: Hfs[“TextDelimited[[‘json’]]”][“s3://wiw-snowplow-output2/enriched/bad/run=2017-02-27-16-06-43”]
2017-02-27 16:16:18,166 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] sink: Hfs[“TextDelimited[[‘app_id’, ‘platform’, ‘etl_tstamp’, ‘collector_tstamp’, ‘dvce_created_tstamp’, ‘event’, ‘event_id’, ‘txn_id’, ‘name_tracker’, ‘v_tracker’, ‘v_collector’, ‘v_etl’, ‘user_id’, ‘user_ipaddress’, ‘user_fingerprint’, ‘domain_userid’, ‘domain_sessionidx’, ‘network_userid’, ‘geo_country’, ‘geo_region’, ‘geo_city’, ‘geo_zipcode’, ‘geo_latitude’, ‘geo_longitude’, ‘geo_region_name’, ‘ip_isp’, ‘ip_organization’, ‘ip_domain’, ‘ip_netspeed’, ‘page_url’, ‘page_title’, ‘page_referrer’, ‘page_urlscheme’, ‘page_urlhost’, ‘page_urlport’, ‘page_urlpath’, ‘page_urlquery’, ‘page_urlfragment’, ‘refr_urlscheme’, ‘refr_urlhost’, ‘refr_urlport’, ‘refr_urlpath’, ‘refr_urlquery’, ‘refr_urlfragment’, ‘refr_medium’, ‘refr_source’, ‘refr_term’, ‘mkt_medium’, ‘mkt_source’, ‘mkt_term’, ‘mkt_content’, ‘mkt_campaign’, ‘contexts’, ‘se_category’, ‘se_action’, ‘se_label’, ‘se_property’, ‘se_value’, ‘unstruct_event’, ‘tr_orderid’, ‘tr_affiliation’, ‘tr_total’, ‘tr_tax’, ‘tr_shipping’, ‘tr_city’, ‘tr_state’, ‘tr_country’, ‘ti_orderid’, ‘ti_sku’, ‘ti_name’, ‘ti_category’, ‘ti_price’, ‘ti_quantity’, ‘pp_xoffset_min’, ‘pp_xoffset_max’, ‘pp_yoffset_min’, ‘pp_yoffset_max’, ‘useragent’, ‘br_name’, ‘br_family’, ‘br_version’, ‘br_type’, ‘br_renderengine’, ‘br_lang’, ‘br_features_pdf’, ‘br_features_flash’, ‘br_features_java’, ‘br_features_director’, ‘br_features_quicktime’, ‘br_features_realplayer’, ‘br_features_windowsmedia’, ‘br_features_gears’, ‘br_features_silverlight’, ‘br_cookies’, ‘br_colordepth’, ‘br_viewwidth’, ‘br_viewheight’, ‘os_name’, ‘os_family’, ‘os_manufacturer’, ‘os_timezone’, ‘dvce_type’, ‘dvce_ismobile’, ‘dvce_screenwidth’, ‘dvce_screenheight’, ‘doc_charset’, ‘doc_width’, ‘doc_height’, ‘tr_currency’, ‘tr_total_base’, ‘tr_tax_base’, ‘tr_shipping_base’, ‘ti_currency’, ‘ti_price_base’, ‘base_currency’, ‘geo_timezone’, ‘mkt_clickid’, ‘mkt_network’, ‘etl_tags’, ‘dvce_sent_tstamp’, ‘refr_domain_userid’, ‘refr_dvce_tstamp’, ‘derived_contexts’, ‘domain_sessionid’, ‘derived_tstamp’, ‘event_vendor’, ‘event_name’, ‘event_format’, ‘event_version’, ‘event_fingerprint’, ‘true_tstamp’]]”][“hdfs:/local/snowplow/enriched-events”]
2017-02-27 16:16:18,167 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] parallel execution is enabled: true
2017-02-27 16:16:18,167 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] starting jobs: 3
2017-02-27 16:16:18,167 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] allocating threads: 3
2017-02-27 16:16:18,167 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] starting step: (1/3)
2017-02-27 16:16:18,233 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-1): Connecting to ResourceManager at ip-10-30-38-249.ec2.internal/10.30.38.249:8032
2017-02-27 16:16:18,365 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-1): Connecting to ResourceManager at ip-10-30-38-249.ec2.internal/10.30.38.249:8032
2017-02-27 16:16:19,036 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat (pool-5-thread-1): Total input paths to process : 104
2017-02-27 16:16:19,036 INFO org.apache.hadoop.conf.Configuration.deprecation (pool-5-thread-1): mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2017-02-27 16:16:19,331 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-1): number of splits:52
2017-02-27 16:16:19,460 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-1): Submitting tokens for job: job_1488211967557_0002
2017-02-27 16:16:19,560 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-1): Submitted application application_1488211967557_0002
2017-02-27 16:16:19,585 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-1): The url to track the job: http://ip-10-30-38-249.ec2.internal:20888/proxy/application_1488211967557_0002/
2017-02-27 16:16:19,586 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] submitted hadoop job: job_1488211967557_0002
2017-02-27 16:16:19,586 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] tracking url: http://ip-10-30-38-249.ec2.internal:20888/proxy/application_1488211967557_0002/
2017-02-27 16:16:48,378 INFO cascading.util.Update (UpdateRequestTimer): newer Cascading release available: 2.6.3
2017-02-27 16:17:19,780 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] starting step: (3/3) …d/run=2017-02-27-16-06-43
2017-02-27 16:17:19,781 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] starting step: (2/3) …/snowplow/enriched-events
2017-02-27 16:17:19,798 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-10-30-38-249.ec2.internal/10.30.38.249:8032
2017-02-27 16:17:19,832 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-10-30-38-249.ec2.internal/10.30.38.249:8032
2017-02-27 16:17:19,857 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-10-30-38-249.ec2.internal/10.30.38.249:8032
2017-02-27 16:17:19,884 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-10-30-38-249.ec2.internal/10.30.38.249:8032
2017-02-27 16:17:20,294 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-2): Total input paths to process : 52
2017-02-27 16:17:20,331 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): number of splits:52
2017-02-27 16:17:20,369 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): Submitting tokens for job: job_1488211967557_0003
2017-02-27 16:17:20,381 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-2): Submitted application application_1488211967557_0003
2017-02-27 16:17:20,383 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-2): The url to track the job: http://ip-10-30-38-249.ec2.internal:20888/proxy/application_1488211967557_0003/
2017-02-27 16:17:20,383 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] submitted hadoop job: job_1488211967557_0003
2017-02-27 16:17:20,383 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] tracking url: http://ip-10-30-38-249.ec2.internal:20888/proxy/application_1488211967557_0003/
2017-02-27 16:17:20,414 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-3): Total input paths to process : 52
2017-02-27 16:17:20,448 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): number of splits:52
2017-02-27 16:17:20,470 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): Submitting tokens for job: job_1488211967557_0004
2017-02-27 16:17:20,481 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-3): Submitted application application_1488211967557_0004
2017-02-27 16:17:20,483 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-3): The url to track the job: http://ip-10-30-38-249.ec2.internal:20888/proxy/application_1488211967557_0004/
2017-02-27 16:17:20,483 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] submitted hadoop job: job_1488211967557_0004
2017-02-27 16:17:20,483 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] tracking url: http://ip-10-30-38-249.ec2.internal:20888/proxy/application_1488211967557_0004/
2017-02-27 16:18:55,675 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path hdfs:/local/snowplow/enriched-events/_temporary
2017-02-27 16:18:55,970 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path s3://wiw-snowplow-output2/enriched/bad/run=2017-02-27-16-06-43/_temporary
It’s puzzling that everything would report back as “Completed” without error