Hi everyone!
I am running emr-etl-runner-r104-stopleistein
but the job is failing on step Elasticity Spark Step: Enrich Raw Events
I ran this job with --skip staging
as a previous job failed the same way and alread copied the raw logs to s3://snowplow-companyX-etl/processing (at least I can see them there)
I failed to find any useful information in the logs (below). Does anyone have an idea what might be going on? There are only couple megabytes of logs from the clojure collector so I donβt think hardware should be an issue.
config.yml
aws:
# Credentials can be hardcoded or set in environment variables
access_key_id: <%= ENV['AWS_SNOWPLOW_ACCESS_KEY'] %>
secret_access_key: <%= ENV['AWS_SNOWPLOW_SECRET_KEY'] %>
s3:
region: ap-southeast-1
buckets:
assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
log: s3://snowplow-companyX-out/emr-etl-logs
raw:
in:
- s3n://elasticbeanstalk-ap-southeast-1-304977849758/resources/environments/logs/publish/e-btrqmphmmc
processing: s3n://snowplow-companyX-etl/processing
archive: s3://snowplow-companyX-out/archive # e.g. s3://my-archive-bucket/raw
enriched:
good: s3://snowplow-companyX-out/enriched/good # e.g. s3://my-out-bucket/enriched/good
bad: s3://snowplow-companyX-out/enriched/bad # e.g. s3://my-out-bucket/enriched/bad
errors: # Leave blank unless :continue_on_unexpected_error: set to true below
archive: s3://snowplow-companyX-out/enriched/archive # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
shredded:
good: s3://snowplow-companyX-out/shredded/good # e.g. s3://my-out-bucket/shredded/good
bad: s3://snowplow-companyX-out/shredded/bad # e.g. s3://my-out-bucket/shredded/bad
errors: # Leave blank unless :continue_on_unexpected_error: set to true below
archive: s3://snowplow-companyX-out/shredded/archive # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
emr:
ami_version: 5.9.0
region: ap-southeast-1 # Always set this
jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
service_role: EMR_DefaultRole # Created using $ aws emr create-default-roles
placement: # Set this if not running in VPC. Leave blank otherwise
ec2_subnet_id: subnet-6b4dba32 # Set this if running in VPC. Leave blank otherwise
ec2_key_name: snowplow-companyX-emr-etl-runner
bootstrap: [] # Set this to specify custom boostrap actions. Leave empty otherwise
software:
hbase: # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
lingual: # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
# Adjust your Hadoop cluster below
jobflow:
job_name: Snowplow company ETL # Give your job a name
master_instance_type: m4.large
core_instance_count: 1
core_instance_type: m4.large
core_instance_ebs: # Optional. Attach an EBS volume to each core instance.
volume_size: 10 # Gigabytes
volume_type: "gp2"
volume_iops: 400 # Optional. Will only be used if volume_type is "io1"
ebs_optimized: false # Optional. Will default to true
task_instance_count: 0 # Increase to use spot instances
task_instance_type: c4.large
task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
configuration:
yarn-site:
yarn.resourcemanager.am.max-attempts: "1"
spark:
maximizeResourceAllocation: "true"
additional_info: # Optional JSON string for selecting additional features
collectors:
format: clj-tomcat # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
versions:
spark_enrich: 1.14.0 # Version of the Spark Enrichment process
continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
versions:
rdb_loader: 0.14.0
rdb_shredder: 0.13.1 # Version of the Spark Shredding process
hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
monitoring:
tags:
description: emr etl run companyX # Name-value pairs describing this job
logging:
level: DEBUG # You can optionally switch to INFO for production
snowplow:
method: get
app_id: snowplow # e.g. snowplow
collector: collector.fooobar.com # e.g. d3rkrsqld9gmqf.cloudfront.net
collector log
2018-07-21T14:02:23.722Z INFO Ensure step 2 jar file command-runner.jar
2018-07-21T14:02:23.723Z INFO StepRunner: Created Runner for step 2
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --class com.snowplowanalytics.snowplow.enrich.spark.EnrichJob --master yarn --deploy-mode cluster s3://snowplow-hosted-assets-ap-southeast-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.14.0.jar --input-format clj-tomcat --etl-timestamp 1532181419947 --iglu-config ewog9Cn0K --enrichments eyJzY2 --input-folder hdfs:///local/snowplow/raw-events/* --output-folder hdfs:///local/snowplow/enriched-events/ --bad-folder s3://snowplow-datadriven-out/enriched/bad/run=2018-07-21-13-56-59/'
INFO Environment:
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
LESS_TERMCAP_md=[01;38;5;208m
LESS_TERMCAP_me=[0m
HISTCONTROL=ignoredups
LESS_TERMCAP_mb=[01;31m
AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
UPSTART_JOB=rc
LESS_TERMCAP_se=[0m
HISTSIZE=1000
HADOOP_ROOT_LOGGER=INFO,DRFA
JAVA_HOME=/etc/alternatives/jre
AWS_DEFAULT_REGION=ap-southeast-1
AWS_ELB_HOME=/opt/aws/apitools/elb
LESS_TERMCAP_us=[04;38;5;111m
EC2_HOME=/opt/aws/apitools/ec2
TERM=linux
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
runlevel=3
LANG=en_US.UTF-8
AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
MAIL=/var/spool/mail/hadoop
LESS_TERMCAP_ue=[0m
LOGNAME=hadoop
PWD=/
LANGSH_SOURCED=1
HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-32SK2U9TW6WV4/tmp
_=/etc/alternatives/jre/bin/java
CONSOLETYPE=serial
RUNLEVEL=3
LESSOPEN=||/usr/bin/lesspipe.sh %s
previous=N
UPSTART_EVENTS=runlevel
AWS_PATH=/opt/aws
USER=hadoop
UPSTART_INSTANCE=
PREVLEVEL=N
HADOOP_LOGFILE=syslog
PYTHON_INSTALL_LAYOUT=amzn
HOSTNAME=ip-172-31-0-233
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-32SK2U9TW6WV4
EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
SHLVL=5
HOME=/home/hadoop
HADOOP_IDENT_STRING=hadoop
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-32SK2U9TW6WV4/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-32SK2U9TW6WV4/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-32SK2U9TW6WV4
INFO ProcessRunner started child process 8895 :
hadoop 8895 4018 0 14:02 ? 00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --class com.snowplowanalytics.snowplow.enrich.spark.EnrichJob --master yarn --deploy-mode cluster s3://snowplow-hosted-assets-ap-southeast-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.14.0.jar --input-format clj-tomcat --etl-timestamp 1532181419947 --iglu-config ewog9<abbreviatedCn0K --enrichments eyJzY<abbreviated>9fX1dfQ== --input-folder hdfs:///local/snowplow/raw-events/* --output-folder hdfs:///local/snowplow/enriched-events/ --bad-folder s3://snowplow-datadriven-out/enriched/bad/run=2018-07-21-13-56-59/
2018-07-21T14:02:27.773Z INFO HadoopJarStepRunner.Runner: startRun() called for s-32SK2U9TW6WV4 Child Pid: 8895
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 36 seconds
2018-07-21T14:03:02.057Z INFO Step created jobs:
2018-07-21T14:03:02.057Z WARN Step failed with exitCode 1 and took 36 seconds
stderr log
Warning: Skip remote jar s3://snowplow-hosted-assets-ap-southeast-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.14.0.jar.
18/07/21 14:02:29 INFO RMProxy: Connecting to ResourceManager at ip-172-31-0-233.ap-southeast-1.compute.internal/172.31.0.233:8032
18/07/21 14:02:30 INFO Client: Requesting a new application from cluster with 1 NodeManagers
18/07/21 14:02:30 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6144 MB per container)
18/07/21 14:02:30 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
18/07/21 14:02:30 INFO Client: Setting up container launch context for our AM
18/07/21 14:02:30 INFO Client: Setting up the launch environment for our AM container
18/07/21 14:02:30 INFO Client: Preparing resources for our AM container
18/07/21 14:02:32 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/07/21 14:02:33 INFO Client: Uploading resource file:/mnt/tmp/spark-7054aa11-2c97-4a67-8d19-d3a67aa30524/__spark_libs__7564400749392618887.zip -> hdfs://ip-172-31-0-233.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1532181544882_0002/__spark_libs__7564400749392618887.zip
18/07/21 14:02:37 WARN RoleMappings: Found no mappings configured with 'fs.s3.authorization.roleMapping', credentials resolution may not work as expected
18/07/21 14:02:38 INFO Client: Uploading resource s3://snowplow-hosted-assets-ap-southeast-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.14.0.jar -> hdfs://ip-172-31-0-233.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1532181544882_0002/snowplow-spark-enrich-1.14.0.jar
18/07/21 14:02:38 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets-ap-southeast-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.14.0.jar' for reading
18/07/21 14:02:40 INFO Client: Uploading resource file:/mnt/tmp/spark-7054aa11-2c97-4a67-8d19-d3a67aa30524/__spark_conf__5438212996105278327.zip -> hdfs://ip-172-31-0-233.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1532181544882_0002/__spark_conf__.zip
18/07/21 14:02:40 INFO SecurityManager: Changing view acls to: hadoop
18/07/21 14:02:40 INFO SecurityManager: Changing modify acls to: hadoop
18/07/21 14:02:40 INFO SecurityManager: Changing view acls groups to:
18/07/21 14:02:40 INFO SecurityManager: Changing modify acls groups to:
18/07/21 14:02:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/07/21 14:02:40 INFO Client: Submitting application application_1532181544882_0002 to ResourceManager
18/07/21 14:02:40 INFO YarnClientImpl: Submitted application application_1532181544882_0002
18/07/21 14:02:41 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:41 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1532181760516
final status: UNDEFINED
tracking URL: http://ip-172-31-0-233.ap-southeast-1.compute.internal:20888/proxy/application_1532181544882_0002/
user: hadoop
18/07/21 14:02:42 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:43 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:44 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:45 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:46 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:47 INFO Client: Application report for application_1532181544882_0002 (state: ACCEPTED)
18/07/21 14:02:48 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:48 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.10.163
ApplicationMaster RPC port: 0
queue: default
start time: 1532181760516
final status: UNDEFINED
tracking URL: http://ip-172-31-0-233.ap-southeast-1.compute.internal:20888/proxy/application_1532181544882_0002/
user: hadoop
18/07/21 14:02:49 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:50 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:51 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:52 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:53 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:54 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:55 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:56 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:57 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:58 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:02:59 INFO Client: Application report for application_1532181544882_0002 (state: RUNNING)
18/07/21 14:03:00 INFO Client: Application report for application_1532181544882_0002 (state: FINISHED)
18/07/21 14:03:00 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.31.10.163
ApplicationMaster RPC port: 0
queue: default
start time: 1532181760516
final status: FAILED
tracking URL: http://ip-172-31-0-233.ap-southeast-1.compute.internal:20888/proxy/application_1532181544882_0002/
user: hadoop
18/07/21 14:03:00 INFO Client: Deleted staging directory hdfs://ip-172-31-0-233.ap-southeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1532181544882_0002
Exception in thread "main" org.apache.spark.SparkException: Application application_1532181544882_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/07/21 14:03:00 INFO ShutdownHookManager: Shutdown hook called
18/07/21 14:03:00 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7054aa11-2c97-4a67-8d19-d3a67aa30524
Command exiting with ret '1'