Hi @alex
I have gone through all of the links but haven’t found anything valuable there. And the error logs is giving me only this info. Can you please see my general logs :
2016-10-03 07:30:10,843 INFO com.amazon.ws.emr.hadoop.fs.EmrFileSystem (main): Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2016-10-03 07:30:11,020 INFO amazon.emr.metrics.MetricsSaver (main): MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: true maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1475479693341
2016-10-03 07:30:11,020 INFO amazon.emr.metrics.MetricsSaver (main): Created MetricsSaver j-38DY68FH7T7AV:i-0bf837d89ff9e1e02:RunJar:06741 period:60 /mnt/var/em/raw/i-0bf837d89ff9e1e02_20161003_RunJar_06741_raw.bin
2016-10-03 07:30:11,951 INFO cascading.flow.hadoop.util.HadoopUtil (main): resolving application jar from found main method on: com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$
2016-10-03 07:30:11,952 INFO cascading.flow.hadoop.planner.HadoopPlanner (main): using application jar: /mnt/var/lib/hadoop/steps/s-3FP6RQ09P7TZE/snowplow-hadoop-enrich-1.8.0.jar
2016-10-03 07:30:11,963 INFO cascading.property.AppProps (main): using app.id: A1B40E15E1D54A26BF6D8D8326B62B60
2016-10-03 07:30:12,575 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2016-10-03 07:30:12,742 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2016-10-03 07:30:12,903 INFO cascading.util.Version (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Concurrent, Inc - Cascading 2.6.0
2016-10-03 07:30:12,905 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] starting
2016-10-03 07:30:12,905 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] source: Hfs["TextLine[['offset', 'line']->[ALL]]"]["s3://udmd-d-storage/udmd-d-etl/processing"]
2016-10-03 07:30:12,905 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] sink: Hfs["TextDelimited[['json']]"]["s3://udmd-d-storage/udmd-d-enriched/enriched/bad/run=2016-10-03-07-25-44"]
2016-10-03 07:30:12,906 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] sink: Hfs["TextDelimited[['app_id', 'platform', 'etl_tstamp', 'collector_tstamp', 'dvce_created_tstamp', 'event', 'event_id', 'txn_id', 'name_tracker', 'v_tracker', 'v_collector', 'v_etl', 'user_id', 'user_ipaddress', 'user_fingerprint', 'domain_userid', 'domain_sessionidx', 'network_userid', 'geo_country', 'geo_region', 'geo_city', 'geo_zipcode', 'geo_latitude', 'geo_longitude', 'geo_region_name', 'ip_isp', 'ip_organization', 'ip_domain', 'ip_netspeed', 'page_url', 'page_title', 'page_referrer', 'page_urlscheme', 'page_urlhost', 'page_urlport', 'page_urlpath', 'page_urlquery', 'page_urlfragment', 'refr_urlscheme', 'refr_urlhost', 'refr_urlport', 'refr_urlpath', 'refr_urlquery', 'refr_urlfragment', 'refr_medium', 'refr_source', 'refr_term', 'mkt_medium', 'mkt_source', 'mkt_term', 'mkt_content', 'mkt_campaign', 'contexts', 'se_category', 'se_action', 'se_label', 'se_property', 'se_value', 'unstruct_event', 'tr_orderid', 'tr_affiliation', 'tr_total', 'tr_tax', 'tr_shipping', 'tr_city', 'tr_state', 'tr_country', 'ti_orderid', 'ti_sku', 'ti_name', 'ti_category', 'ti_price', 'ti_quantity', 'pp_xoffset_min', 'pp_xoffset_max', 'pp_yoffset_min', 'pp_yoffset_max', 'useragent', 'br_name', 'br_family', 'br_version', 'br_type', 'br_renderengine', 'br_lang', 'br_features_pdf', 'br_features_flash', 'br_features_java', 'br_features_director', 'br_features_quicktime', 'br_features_realplayer', 'br_features_windowsmedia', 'br_features_gears', 'br_features_silverlight', 'br_cookies', 'br_colordepth', 'br_viewwidth', 'br_viewheight', 'os_name', 'os_family', 'os_manufacturer', 'os_timezone', 'dvce_type', 'dvce_ismobile', 'dvce_screenwidth', 'dvce_screenheight', 'doc_charset', 'doc_width', 'doc_height', 'tr_currency', 'tr_total_base', 'tr_tax_base', 'tr_shipping_base', 'ti_currency', 'ti_price_base', 'base_currency', 'geo_timezone', 'mkt_clickid', 'mkt_network', 'etl_tags', 'dvce_sent_tstamp', 'refr_domain_userid', 'refr_dvce_tstamp', 'derived_contexts', 'domain_sessionid', 'derived_tstamp', 'event_vendor', 'event_name', 'event_format', 'event_version', 'event_fingerprint', 'true_tstamp']]"]["hdfs:/local/snowplow/enriched-events"]
2016-10-03 07:30:12,906 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] parallel execution is enabled: true
2016-10-03 07:30:12,906 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] starting jobs: 3
2016-10-03 07:30:12,906 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] allocating threads: 3
2016-10-03 07:30:12,906 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics....] starting step: (1/3)
2016-10-03 07:30:12,960 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-1): Connecting to ResourceManager at ip-172-31-3-84.ap-south-1.compute.internal/172.31.3.84:8032
2016-10-03 07:30:13,101 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-1): Connecting to ResourceManager at ip-172-31-3-84.ap-south-1.compute.internal/172.31.3.84:8032
2016-10-03 07:30:14,379 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader (pool-5-thread-1): Loaded native gpl library
2016-10-03 07:30:14,382 INFO com.hadoop.compression.lzo.LzoCodec (pool-5-thread-1): Successfully loaded & initialized native-lzo library [hadoop-lzo rev 426d94a07125cf9447bb0c2b336cf10b4c254375]
2016-10-03 07:30:14,435 INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem (pool-5-thread-1): listStatus s3://udmd-d-storage/udmd-d-etl/processing with recursive false
2016-10-03 07:30:14,467 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-1): Total input paths to process : 7
2016-10-03 07:30:14,644 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-1): number of splits:7
2016-10-03 07:30:14,877 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-1): Submitting tokens for job: job_1475479683156_0001
2016-10-03 07:30:15,317 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-1): Submitted application application_1475479683156_0001
2016-10-03 07:30:15,354 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-1): The url to track the job: http://ip-172-31-3-84.ap-south-1.compute.internal:20888/proxy/application_1475479683156_0001/
2016-10-03 07:30:15,354 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics....] submitted hadoop job: job_1475479683156_0001
2016-10-03 07:30:15,354 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics....] tracking url: http://ip-172-31-3-84.ap-south-1.compute.internal:20888/proxy/application_1475479683156_0001/
2016-10-03 07:30:44,692 INFO cascading.util.Update (UpdateRequestTimer): newer Cascading release available: 2.6.3
2016-10-03 07:34:10,660 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics....] starting step: (2/3) .../snowplow/enriched-events
2016-10-03 07:34:10,660 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] starting step: (3/3) ...d/run=2016-10-03-07-25-44
2016-10-03 07:34:10,680 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-172-31-3-84.ap-south-1.compute.internal/172.31.3.84:8032
2016-10-03 07:34:10,701 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-3): Connecting to ResourceManager at ip-172-31-3-84.ap-south-1.compute.internal/172.31.3.84:8032
2016-10-03 07:34:10,747 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-172-31-3-84.ap-south-1.compute.internal/172.31.3.84:8032
2016-10-03 07:34:10,768 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-2): Connecting to ResourceManager at ip-172-31-3-84.ap-south-1.compute.internal/172.31.3.84:8032
2016-10-03 07:34:11,530 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-3): Total input paths to process : 7
2016-10-03 07:34:11,531 INFO org.apache.hadoop.net.NetworkTopology (pool-5-thread-3): Adding a new node: /default-rack/172.31.14.7:50010
2016-10-03 07:34:11,535 INFO org.apache.hadoop.mapred.FileInputFormat (pool-5-thread-2): Total input paths to process : 7
2016-10-03 07:34:11,536 INFO org.apache.hadoop.net.NetworkTopology (pool-5-thread-2): Adding a new node: /default-rack/172.31.14.7:50010
2016-10-03 07:34:11,590 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): number of splits:10
2016-10-03 07:34:11,608 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): number of splits:10
2016-10-03 07:34:11,665 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-3): Submitting tokens for job: job_1475479683156_0003
2016-10-03 07:34:11,676 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-2): Submitting tokens for job: job_1475479683156_0002
2016-10-03 07:34:11,689 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-3): Submitted application application_1475479683156_0003
2016-10-03 07:34:11,694 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-3): The url to track the job: http://ip-172-31-3-84.ap-south-1.compute.internal:20888/proxy/application_1475479683156_0003/
2016-10-03 07:34:11,694 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] submitted hadoop job: job_1475479683156_0003
2016-10-03 07:34:11,694 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] tracking url: http://ip-172-31-3-84.ap-south-1.compute.internal:20888/proxy/application_1475479683156_0003/
2016-10-03 07:34:11,700 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-2): Submitted application application_1475479683156_0002
2016-10-03 07:34:11,703 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-2): The url to track the job: http://ip-172-31-3-84.ap-south-1.compute.internal:20888/proxy/application_1475479683156_0002/
2016-10-03 07:34:11,703 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics....] submitted hadoop job: job_1475479683156_0002
2016-10-03 07:34:11,703 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics....] tracking url: http://ip-172-31-3-84.ap-south-1.compute.internal:20888/proxy/application_1475479683156_0002/
2016-10-03 07:37:31,874 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] hadoop job job_1475479683156_0003 state at FAILED
2016-10-03 07:37:31,875 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] failure info: Task failed task_1475479683156_0003_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0
2016-10-03 07:37:31,895 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] task completion events identify failed tasks
2016-10-03 07:37:31,895 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] task completion events count: 10
2016-10-03 07:37:31,895 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000001_0, Status : SUCCEEDED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000000_0, Status : SUCCEEDED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000003_0, Status : FAILED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000002_0, Status : FAILED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000004_0, Status : SUCCEEDED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000003_1, Status : FAILED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000002_1, Status : FAILED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000003_2, Status : FAILED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000002_2, Status : FAILED
2016-10-03 07:37:31,896 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics....] event = Task Id : attempt_1475479683156_0003_m_000003_3, Status : TIPFAILED
2016-10-03 07:37:31,902 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] stopping all jobs
2016-10-03 07:37:31,902 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] stopping: (3/3) ...d/run=2016-10-03-07-25-44
2016-10-03 07:37:31,903 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] stopping: (2/3) .../snowplow/enriched-events
2016-10-03 07:37:32,905 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-14-6.ap-south-1.compute.internal/172.31.14.6:33515. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-10-03 07:37:33,906 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-14-6.ap-south-1.compute.internal/172.31.14.6:33515. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-10-03 07:37:34,907 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-14-6.ap-south-1.compute.internal/172.31.14.6:33515. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-10-03 07:37:35,013 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-10-03 07:37:35,312 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] stopping: (1/3)
2016-10-03 07:37:36,313 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-14-6.ap-south-1.compute.internal/172.31.14.6:42394. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-10-03 07:37:37,314 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-14-6.ap-south-1.compute.internal/172.31.14.6:42394. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-10-03 07:37:38,314 INFO org.apache.hadoop.ipc.Client (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Retrying connect to server: ip-172-31-14-6.ap-south-1.compute.internal/172.31.14.6:42394. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-10-03 07:37:38,418 INFO org.apache.hadoop.mapred.ClientServiceDelegate (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-10-03 07:37:38,544 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics....] stopped all jobs
2016-10-03 07:37:38,556 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path hdfs:/local/snowplow/enriched-events/_temporary
2016-10-03 07:37:38,684 INFO cascading.tap.hadoop.util.Hadoop18TapUtil (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): deleting temp path s3://udmd-d-storage/udmd-d-enriched/enriched/bad/run=2016-10-03-07-25-44/_temporary
Can this error is due to the error file I am trying to parse?
Waiting for your reply. Appreciate your help.