Hi There,
Please help!
I have the ETL-EMR job that is failing at Step two. I am running the job using --skip staging option. the job was running for over month without issues, I made the following change:
Changed
jobflow:
master_instance_type: r3.xlarge
core_instance_count: 2
core_instance_type: r3.xlarge
task_instance_count: 0 # Increase to use spot instances
task_instance_type: r3.xlarge
task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
TO
jobflow:
master_instance_type: m1.medium
core_instance_count: 3
core_instance_type: m3.xlarge
task_instance_count: 0
task_instance_type: m1.small
task_instance_bid: 0.25
Error - “…running beyond physical memory limits”
Syslog for Step 2.
2017-01-09 23:17:46,603 INFO amazon.emr.metrics.MetricsSaver (main): MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1484003463520
2017-01-09 23:17:46,604 INFO amazon.emr.metrics.MetricsSaver (main): Created MetricsSaver j-3K5AFU21S2CDV:i-018b9b79d37f04a23:RunJar:08207 period:60 /mnt/var/em/raw/i-018b9b79d37f04a23_20170109_RunJar_08207_raw.bin
2017-01-09 23:17:48,259 INFO cascading.flow.hadoop.util.HadoopUtil (main): resolving application jar from found main method on: com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$
2017-01-09 23:17:48,261 INFO cascading.flow.hadoop.planner.HadoopPlanner (main): using application jar: /mnt/var/lib/hadoop/steps/s-38789WQITNL6U/snowplow-hadoop-enrich-1.7.0.jar
2017-01-09 23:17:48,275 INFO cascading.property.AppProps (main): using app.id: 999FFF53599A42269AC131EB4FF52F53
2017-01-09 23:17:48,415 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2017-01-09 23:17:49,763 INFO com.amazon.ws.emr.hadoop.fs.EmrFileSystem (main): Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2017-01-09 23:17:50,237 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2017-01-09 23:17:50,778 INFO cascading.util.Version (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): Concurrent, Inc - Cascading 2.6.0
2017-01-09 23:17:50,782 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] starting
2017-01-09 23:17:50,782 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] source: Hfs[“LzoByteArrayScheme[[UNKNOWN]->[ALL]]”][“hdfs:/local/snowplow/raw-events”]
2017-01-09 23:17:50,783 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] sink: Hfs[“TextDelimited[[‘json’]]”][“s3://launchpad-etl-emr/enriched/bad/run=2017-01-09-23-03-16”]
2017-01-09 23:17:50,783 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] sink: Hfs[“TextDelimited[[‘app_id’, ‘platform’, ‘etl_tstamp’, ‘collector_tstamp’, ‘dvce_created_tstamp’, ‘event’, ‘event_id’, ‘txn_id’, ‘name_tracker’, ‘v_tracker’, ‘v_collector’, ‘v_etl’, ‘user_id’, ‘user_ipaddress’, ‘user_fingerprint’, ‘domain_userid’, ‘domain_sessionidx’, ‘network_userid’, ‘geo_country’, ‘geo_region’, ‘geo_city’, ‘geo_zipcode’, ‘geo_latitude’, ‘geo_longitude’, ‘geo_region_name’, ‘ip_isp’, ‘ip_organization’, ‘ip_domain’, ‘ip_netspeed’, ‘page_url’, ‘page_title’, ‘page_referrer’, ‘page_urlscheme’, ‘page_urlhost’, ‘page_urlport’, ‘page_urlpath’, ‘page_urlquery’, ‘page_urlfragment’, ‘refr_urlscheme’, ‘refr_urlhost’, ‘refr_urlport’, ‘refr_urlpath’, ‘refr_urlquery’, ‘refr_urlfragment’, ‘refr_medium’, ‘refr_source’, ‘refr_term’, ‘mkt_medium’, ‘mkt_source’, ‘mkt_term’, ‘mkt_content’, ‘mkt_campaign’, ‘contexts’, ‘se_category’, ‘se_action’, ‘se_label’, ‘se_property’, ‘se_value’, ‘unstruct_event’, ‘tr_orderid’, ‘tr_affiliation’, ‘tr_total’, ‘tr_tax’, ‘tr_shipping’, ‘tr_city’, ‘tr_state’, ‘tr_country’, ‘ti_orderid’, ‘ti_sku’, ‘ti_name’, ‘ti_category’, ‘ti_price’, ‘ti_quantity’, ‘pp_xoffset_min’, ‘pp_xoffset_max’, ‘pp_yoffset_min’, ‘pp_yoffset_max’, ‘useragent’, ‘br_name’, ‘br_family’, ‘br_version’, ‘br_type’, ‘br_renderengine’, ‘br_lang’, ‘br_features_pdf’, ‘br_features_flash’, ‘br_features_java’, ‘br_features_director’, ‘br_features_quicktime’, ‘br_features_realplayer’, ‘br_features_windowsmedia’, ‘br_features_gears’, ‘br_features_silverlight’, ‘br_cookies’, ‘br_colordepth’, ‘br_viewwidth’, ‘br_viewheight’, ‘os_name’, ‘os_family’, ‘os_manufacturer’, ‘os_timezone’, ‘dvce_type’, ‘dvce_ismobile’, ‘dvce_screenwidth’, ‘dvce_screenheight’, ‘doc_charset’, ‘doc_width’, ‘doc_height’, ‘tr_currency’, ‘tr_total_base’, ‘tr_tax_base’, ‘tr_shipping_base’, ‘ti_currency’, ‘ti_price_base’, ‘base_currency’, ‘geo_timezone’, ‘mkt_clickid’, ‘mkt_network’, ‘etl_tags’, ‘dvce_sent_tstamp’, ‘refr_domain_userid’, ‘refr_dvce_tstamp’, ‘derived_contexts’, ‘domain_sessionid’, ‘derived_tstamp’, ‘event_vendor’, ‘event_name’, ‘event_format’, ‘event_version’, ‘event_fingerprint’, ‘true_tstamp’]]”][“hdfs:/local/snowplow/enriched-events”]
2017-01-09 23:17:50,784 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] parallel execution is enabled: true
2017-01-09 23:17:50,784 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] starting jobs: 3
2017-01-09 23:17:50,784 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] allocating threads: 3
2017-01-09 23:17:50,785 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] starting step: (1/3)
2017-01-09 23:17:50,907 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-1): Connecting to ResourceManager at ip-10-129-14-54.ap-southeast-2.compute.internal/10.129.14.54:8032
2017-01-09 23:17:51,186 INFO org.apache.hadoop.yarn.client.RMProxy (pool-5-thread-1): Connecting to ResourceManager at ip-10-129-14-54.ap-southeast-2.compute.internal/10.129.14.54:8032
2017-01-09 23:18:09,617 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat (pool-5-thread-1): Total input paths to process : 30294
2017-01-09 23:18:09,617 INFO org.apache.hadoop.conf.Configuration.deprecation (pool-5-thread-1): mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2017-01-09 23:18:21,403 INFO cascading.util.Update (UpdateRequestTimer): newer Cascading release available: 2.6.3
2017-01-09 23:18:28,537 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-1): number of splits:15146
2017-01-09 23:18:28,805 INFO org.apache.hadoop.mapreduce.JobSubmitter (pool-5-thread-1): Submitting tokens for job: job_1484003452716_0002
2017-01-09 23:18:29,010 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (pool-5-thread-1): Submitted application application_1484003452716_0002
2017-01-09 23:18:29,065 INFO org.apache.hadoop.mapreduce.Job (pool-5-thread-1): The url to track the job: http://ip-10-129-14-54.ap-southeast-2.compute.internal:20888/proxy/application_1484003452716_0002/
2017-01-09 23:18:29,067 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] submitted hadoop job: job_1484003452716_0002
2017-01-09 23:18:29,067 INFO cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] tracking url: http://ip-10-129-14-54.ap-southeast-2.compute.internal:20888/proxy/application_1484003452716_0002/
2017-01-09 23:25:27,282 WARN cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] hadoop job job_1484003452716_0002 state at FAILED
2017-01-09 23:25:27,284 WARN cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] failure info: Application application_1484003452716_0002 failed 2 times due to AM Container for appattempt_1484003452716_0002_000002 exited with exitCode: -104
For more detailed output, check application tracking page:http://ip-10-129-14-54.ap-southeast-2.compute.internal:8088/cluster/app/application_1484003452716_0002Then, click on links to logs of each attempt.
Diagnostics: Container [pid=16209,containerID=container_1484003452716_0002_02_000001] is running beyond physical memory limits. Current usage: 528.6 MB of 512 MB physical memory used; 1.9 GB of 2.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1484003452716_0002_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 16228 16209 16209 16209 (java) 3858 274 1918607360 134624 /usr/lib/jvm/java-openjdk/bin/java -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1484003452716_0002/container_1484003452716_0002_02_000001/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1484003452716_0002/container_1484003452716_0002_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
|- 16209 16207 16209 16209 (bash) 1 0 115806208 692 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1484003452716_0002/container_1484003452716_0002_02_000001/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1484003452716_0002/container_1484003452716_0002_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/var/log/hadoop-yarn/containers/application_1484003452716_0002/container_1484003452716_0002_02_000001/stdout 2>/var/log/hadoop-yarn/containers/application_1484003452716_0002/container_1484003452716_0002_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
2017-01-09 23:25:27,298 WARN cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] task completion events identify failed tasks
2017-01-09 23:25:27,298 WARN cascading.flow.FlowStep (pool-5-thread-1): [com.snowplowanalytics…] task completion events count: 0
2017-01-09 23:25:27,302 WARN cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] abandoning step: (3/3) …/snowplow/enriched-events, predecessor failed: (1/3)
2017-01-09 23:25:27,302 WARN cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] abandoning step: (2/3) …d/run=2017-01-09-23-03-16, predecessor failed: (1/3)
2017-01-09 23:25:27,303 INFO cascading.flow.FlowStep (pool-5-thread-3): [com.snowplowanalytics…] stopping: (3/3) …/snowplow/enriched-events
2017-01-09 23:25:27,303 INFO cascading.flow.FlowStep (pool-5-thread-2): [com.snowplowanalytics…] stopping: (2/3) …d/run=2017-01-09-23-03-16
2017-01-09 23:25:27,304 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping all jobs
2017-01-09 23:25:27,304 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (3/3) …/snowplow/enriched-events
2017-01-09 23:25:27,305 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (2/3) …d/run=2017-01-09-23-03-16
2017-01-09 23:25:27,305 INFO cascading.flow.FlowStep (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopping: (1/3)
2017-01-09 23:25:27,307 INFO cascading.flow.Flow (flow com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob): [com.snowplowanalytics…] stopped all jobs
Regards
SS