Problem at S3 to HDFS S3DistCp step

sandesh · October 9, 2017, 2:41pm

I have configured amazon redshift and i am trying load the events to the redshift database.
Below is the command i am using to run.

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze

i am using snowplow_emr_r92_maiden_castle.zip to run.

But i am getting below error;

D, [2017-10-09T14:13:56.956000 #2813] DEBUG -- : Initializing EMR jobflow
D, [2017-10-09T14:14:07.511000 #2813] DEBUG -- : EMR jobflow j-1E5NDIFHBCMRM started, waiting for jobflow to complete...
I, [2017-10-09T14:26:10.972000 #2813]  INFO -- : No RDB Loader logs
F, [2017-10-09T14:26:11.320000 #2813] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-1E5NDIFHBCMRM failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-09 14:19:57 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw s3://snowplowdataevents2/ -> Raw Staging S3: COMPLETED ~ 00:01:42 [2017-10-09 14:19:58 +0000 - 2017-10-09 14:21:41 +0000]
 - 2. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: FAILED ~ 00:02:14 [2017-10-09 14:21:43 +0000 - 2017-10-09 14:23:57 +0000]
 - 3. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 4. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 9. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 10. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 11. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 12. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Elasticity Spark Step: Enrich Raw Events: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

My YML configuration file is below:

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: xxxxx
  secret_access_key: xxxxxxx
  #keypair: Snowplowkeypair
  #key-pair-file: /home/ubuntu/snowplow/4-storage/config/Snowplowkeypair.pem
  region: us-east-1
  s3:
	region: us-east-1
	buckets:
	  assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
	  jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
	  log: s3://snowplowdataevents2/logs
	  raw:
		in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
		  - s3://snowplowdataevents2/      # e.g. s3://my-old-collector-bucket
		processing: s3://snowplowdataevents2/raw/processing
		archive: s3://snowplowdataevents2/raw/archive   # e.g. s3://my-archive-bucket/raw
	  enriched:
		good: s3://snowplowdataevents2/enriched/good        # e.g. s3://my-out-bucket/enriched/good
		bad: s3://snowplowdataevents2/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
		errors: s3://snowplowdataevents2/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://snowplowdataevents2/enriched/archive    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
	  shredded:
		good: s3://snowplowdataevents2/shredded/good        # e.g. s3://my-out-bucket/shredded/good
		bad: s3://snowplowdataevents2/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
		errors: s3://snowplowdataevents2/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://snowplowdataevents2/shredded/archive     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
	ami_version: 5.5.0
	region: us-east-1       # Always set this
	jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
	service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
	placement: us-east-1a      # Set this if not running in VPC. Leave blank otherwise
	ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
	ec2_key_name: Snowplowkeypair
	bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
	software:
	  hbase:              # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
	  lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
	# Adjust your Hadoop cluster below
	jobflow:
	  job_name: Snowplow ETL # Give your job a name
	  master_instance_type: m2.4xlarge
	  core_instance_count: 2
	  core_instance_type: m2.4xlarge
	  core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
		volume_size: 100    # Gigabytes
		volume_type: "gp2"
		volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
		ebs_optimized: false # Optional. Will default to true
	  task_instance_count: 0 # Increase to use spot instances
	  task_instance_type: m2.4xlarge
	  task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
	bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
	configuration:
	  yarn-site:
		yarn.resourcemanager.am.max-attempts: "1"
	  spark:
		maximizeResourceAllocation: "true"
	additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  versions:
	spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
	rdb_loader: 0.12.0
	rdb_shredder: 0.12.0        # Version of the Spark Shredding process
	hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
	level: DEBUG # You can optionally switch to INFO for production
  #snowplow:
	#method: get
	#app_id: unilog # e.g. snowplow
	#collector: 172.31.38.39:8082 # e.g. d3rkrsqld9gmqf.cloudfront.net

iglu_resolver.json file is below

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
	"cacheSize": 500,
	"repositories": [
	  {
		"name": "Iglu Central",
		"priority": 0,
		"vendorPrefixes": [ "com.snowplowanalytics" ],
		"connection": {
		  "http": {
			"uri": "http://iglucentral.com"
		  }
		}
	  }
	]
  }
}

in the config/targets/ i have kept only redshift.json file and below the file:

{
“schema”: “iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/1-0-0”,
“data”: {
“name”: “AWS Redshift enriched events storage”,
“host”: “snowplow.cze0fyuagv4x.us-east-1.redshift.amazonaws.com”,
“database”: “unilog”,
“port”: 5439,
“sslMode”: “DISABLE”,
“username”: “xxxx”,
“password”: “xxxx”,
“schema”: “atomic”,
“maxError”: 1,
“compRows”: 20000,
“purpose”: “ENRICHED_EVENTS”
}
}

Please help to resolve this error…
I have explained it in detail.

sandesh · October 10, 2017, 6:04am

Guys please help me out to resolve this error…

BenFradet · October 10, 2017, 10:11am

I would suggest you look at both the error message displayed in the emr console and the logs for the above step.

sandesh · October 10, 2017, 10:30am

Hey @BenFradet thanks for the reply.

Below is error message in EMR console

Cluster: Snowplow ETLTerminated with errorsShut down as step failed

BenFradet · October 10, 2017, 3:15pm

I meant the error for this particular step not the error for the whole emr cluster.

sandesh · October 10, 2017, 3:32pm

hi @BenFradet

In Load AWS Redshift enriched events storage Storage Target step i am getting error now…
Below is error details

ubuntu@ip-172-31-38-39:~$ ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze
D, [2017-10-10T15:07:40.612000 #2393] DEBUG -- : Initializing EMR jobflow
D, [2017-10-10T15:08:06.901000 #2393] DEBUG -- : EMR jobflow j-EGNGSIVM6MDC started, waiting for jobflow to complete...
I, [2017-10-10T15:30:11.636000 #2393]  INFO -- : No RDB Loader logs
F, [2017-10-10T15:30:11.956000 #2393] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-EGNGSIVM6MDC failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-10 15:13:48 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw s3://snowplowdataevents2/ -> Raw Staging S3: COMPLETED ~ 00:02:36 [2017-10-10 15:13:50 +0000 - 2017-10-10 15:16:26 +0000]
 - 2. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:02:14 [2017-10-10 15:16:28 +0000 - 2017-10-10 15:18:42 +0000]
 - 3. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:12 [2017-10-10 15:18:44 +0000 - 2017-10-10 15:19:56 +0000]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: COMPLETED ~ 00:00:48 [2017-10-10 15:19:58 +0000 - 2017-10-10 15:20:46 +0000]
 - 5. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: COMPLETED ~ 00:00:44 [2017-10-10 15:20:48 +0000 - 2017-10-10 15:21:33 +0000]
 - 6. Elasticity Custom Jar Step: Empty Raw HDFS: COMPLETED ~ 00:00:08 [2017-10-10 15:21:35 +0000 - 2017-10-10 15:21:43 +0000]
 - 7. Elasticity Spark Step: Shred Enriched Events: COMPLETED ~ 00:01:22 [2017-10-10 15:21:45 +0000 - 2017-10-10 15:23:07 +0000]
 - 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: COMPLETED ~ 00:01:10 [2017-10-10 15:23:09 +0000 - 2017-10-10 15:24:19 +0000]
 - 9. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: COMPLETED ~ 00:00:48 [2017-10-10 15:24:21 +0000 - 2017-10-10 15:25:09 +0000]
 - 10. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: COMPLETED ~ 00:02:36 [2017-10-10 15:25:11 +0000 - 2017-10-10 15:27:47 +0000]
 - 11. Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: FAILED ~ 00:00:16 [2017-10-10 15:27:52 +0000 - 2017-10-10 15:28:09 +0000]
 - 12. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Please help me @BenFradet i am in final stage

BenFradet · October 10, 2017, 3:46pm

Same as before, what’s the error message and logs for this particular step?

sandesh · October 10, 2017, 4:00pm

Hey @BenFradet thanks alot for the quick response.
Below is the error message:

17/10/10 15:18:48 INFO RMProxy: Connecting to ResourceManager at ip-172-31-7-173.ec2.internal/172.31.7.173:8032
17/10/10 15:18:48 INFO Client: Requesting a new application from cluster with 2 NodeManagers
17/10/10 15:18:48 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (61440 MB per container)
17/10/10 15:18:48 INFO Client: Will allocate AM container, with 61440 MB memory including 5585 MB overhead
17/10/10 15:18:48 INFO Client: Setting up container launch context for our AM
17/10/10 15:18:48 INFO Client: Setting up the launch environment for our AM container
17/10/10 15:18:48 INFO Client: Preparing resources for our AM container
17/10/10 15:18:50 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/10/10 15:18:54 INFO Client: Uploading resource file:/mnt/tmp/spark-26124303-4a2c-4b70-852a-cc1dd0344211/__spark_libs__5983972670420414884.zip -> hdfs://ip-172-31-7-173.ec2.internal:8020/user/hadoop/.sparkStaging/application_1507648290273_0003/__spark_libs__5983972670420414884.zip
17/10/10 15:18:58 INFO Client: Uploading resource s3://snowplow-hosted-assets-us-east-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar -> hdfs://ip-172-31-7-173.ec2.internal:8020/user/hadoop/.sparkStaging/application_1507648290273_0003/snowplow-spark-enrich-1.9.0.jar
17/10/10 15:18:58 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets-us-east-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar' for reading
17/10/10 15:19:01 INFO Client: Uploading resource file:/mnt/tmp/spark-26124303-4a2c-4b70-852a-cc1dd0344211/__spark_conf__6182307387502343591.zip -> hdfs://ip-172-31-7-173.ec2.internal:8020/user/hadoop/.sparkStaging/application_1507648290273_0003/__spark_conf__.zip
17/10/10 15:19:01 INFO SecurityManager: Changing view acls to: hadoop
17/10/10 15:19:01 INFO SecurityManager: Changing modify acls to: hadoop
17/10/10 15:19:01 INFO SecurityManager: Changing view acls groups to: 
17/10/10 15:19:01 INFO SecurityManager: Changing modify acls groups to: 
17/10/10 15:19:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
17/10/10 15:19:01 INFO Client: Submitting application application_1507648290273_0003 to ResourceManager
17/10/10 15:19:01 INFO YarnClientImpl: Submitted application application_1507648290273_0003
17/10/10 15:19:02 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:02 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1507648741224
	 final status: UNDEFINED
	 tracking URL: http://ip-172-31-7-173.ec2.internal:20888/proxy/application_1507648290273_0003/
	 user: hadoop
17/10/10 15:19:03 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:04 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:05 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:06 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:07 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:08 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:09 INFO Client: Application report for application_1507648290273_0003 (state: ACCEPTED)
17/10/10 15:19:10 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:10 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.31.3.116
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1507648741224
	 final status: UNDEFINED
	 tracking URL: http://ip-172-31-7-173.ec2.internal:20888/proxy/application_1507648290273_0003/
	 user: hadoop
17/10/10 15:19:11 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:12 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:13 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:14 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:15 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:16 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:17 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:18 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:19 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:20 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:21 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:22 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:23 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:24 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:25 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:26 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:27 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:28 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:29 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:30 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:31 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:32 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:33 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:34 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:35 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:36 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:37 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:38 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:39 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:40 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:41 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:42 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:43 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:44 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:45 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:46 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:47 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:48 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:49 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:50 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:51 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:52 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:53 INFO Client: Application report for application_1507648290273_0003 (state: RUNNING)
17/10/10 15:19:54 INFO Client: Application report for application_1507648290273_0003 (state: FINISHED)
17/10/10 15:19:54 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.31.3.116
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1507648741224
	 final status: SUCCEEDED
	 tracking URL: http://ip-172-31-7-173.ec2.internal:20888/proxy/application_1507648290273_0003/
	 user: hadoop
17/10/10 15:19:54 INFO ShutdownHookManager: Shutdown hook called
17/10/10 15:19:54 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-26124303-4a2c-4b70-852a-cc1dd0344211
Command exiting with ret '0'

anton · October 11, 2017, 5:53am

Hi @sandesh,

I don’t think what you posted is logs for Load AWS Redshift enriched events storage Storage Target step (RDB Loader). It looks more like stderr for Spark Enrich or Spark Shred. Could you please instead provide both stdlog and stderr of last failed step.

sandesh · October 11, 2017, 9:43am

In that particular step two file is generated one is controller.gz and stdout.gz

Below is stderr file

Configuration error Attempt to decode value on failed cursor: DownField(roleArn)

And controller.gz file is below here

2017-10-10T15:27:47.723Z INFO Ensure step 11 jar file s3://snowplow-hosted-assets-us-east-1/4-storage/rdb-loader/snowplow-rdb-loader-0.12.0.jar
2017-10-10T15:27:50.893Z INFO StepRunner: Created Runner for step 11
INFO startExec 'hadoop jar /mnt/var/lib/hadoop/steps/s-ATGSV39Q982U/snowplow-rdb-loader-0.12.0.jar --config LS0tCmF3czoKICBhY2Nlc3Nfa2V5X2lkOiAnJwogIHNlY3JldF9hY2Nlc3Nfa2V5OiAnJwogIHJlZ2lvbjogdXMtZWFzdC0xCiAgczM6CiAgICByZWdpb246IHVzLWVhc3QtMQogICAgYnVja2V0czoKICAgICAgYXNzZXRzOiBzMzovL3Nub3dwbG93LWhvc3RlZC1hc3NldHMvCiAgICAgIGpzb25wYXRoX2Fzc2V0czoKICAgICAgbG9nOiBzMzovL3Nub3dwbG93ZGF0YWV2ZW50czIvbG9ncy8KICAgICAgcmF3OgogICAgICAgIGluOgogICAgICAgIC0gczM6Ly9zbm93cGxvd2RhdGFldmVudHMyLwogICAgICAgIHByb2Nlc3Npbmc6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9yYXcvcHJvY2Vzc2luZy8KICAgICAgICBhcmNoaXZlOiBzMzovL3Nub3dwbG93ZGF0YWV2ZW50czIvcmF3L2FyY2hpdmUvCiAgICAgIGVucmljaGVkOgogICAgICAgIGdvb2Q6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9lbnJpY2hlZC9nb29kLwogICAgICAgIGJhZDogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL2VucmljaGVkL2JhZC8KICAgICAgICBlcnJvcnM6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9lbnJpY2hlZC9lcnJvcnMvCiAgICAgICAgYXJjaGl2ZTogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL2VucmljaGVkL2FyY2hpdmUvCiAgICAgIHNocmVkZGVkOgogICAgICAgIGdvb2Q6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9zaHJlZGRlZC9nb29kLwogICAgICAgIGJhZDogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL3NocmVkZGVkL2JhZC8KICAgICAgICBlcnJvcnM6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9zaHJlZGRlZC9lcnJvcnMvCiAgICAgICAgYXJjaGl2ZTogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL3NocmVkZGVkL2FyY2hpdmUvCiAgZW1yOgogICAgYW1pX3ZlcnNpb246IDUuNS4wCiAgICByZWdpb246IHVzLWVhc3QtMQogICAgam9iZmxvd19yb2xlOiBFTVJfRUMyX0RlZmF1bHRSb2xlCiAgICBzZXJ2aWNlX3JvbGU6IEVNUl9EZWZhdWx0Um9sZQogICAgcGxhY2VtZW50OiB1cy1lYXN0LTFhCiAgICBlYzJfc3VibmV0X2lkOgogICAgZWMyX2tleV9uYW1lOiBTbm93cGxvd2tleXBhaXIKICAgIGJvb3RzdHJhcDogW10KICAgIHNvZnR3YXJlOgogICAgICBoYmFzZToKICAgICAgbGluZ3VhbDoKICAgIGpvYmZsb3c6CiAgICAgIGpvYl9uYW1lOiBTbm93cGxvdyBFVEwKICAgICAgbWFzdGVyX2luc3RhbmNlX3R5cGU6IG0yLjR4bGFyZ2UKICAgICAgY29yZV9pbnN0YW5jZV9jb3VudDogMgogICAgICBjb3JlX2luc3RhbmNlX3R5cGU6IG0yLjR4bGFyZ2UKICAgICAgY29yZV9pbnN0YW5jZV9lYnM6CiAgICAgICAgdm9sdW1lX3NpemU6IDEwMAogICAgICAgIHZvbHVtZV90eXBlOiBncDIKICAgICAgICB2b2x1bWVfaW9wczogNDAwCiAgICAgICAgZWJzX29wdGltaXplZDogZmFsc2UKICAgICAgdGFza19pbnN0YW5jZV9jb3VudDogMAogICAgICB0YXNrX2luc3RhbmNlX3R5cGU6IG0yLjR4bGFyZ2UKICAgICAgdGFza19pbnN0YW5jZV9iaWQ6IDAuMDE1CiAgICBib290c3RyYXBfZmFpbHVyZV90cmllczogMwogICAgY29uZmlndXJhdGlvbjoKICAgICAgeWFybi1zaXRlOgogICAgICAgIHlhcm4ucmVzb3VyY2VtYW5hZ2VyLmFtLm1heC1hdHRlbXB0czogJzEnCiAgICAgIHNwYXJrOgogICAgICAgIG1heGltaXplUmVzb3VyY2VBbGxvY2F0aW9uOiAndHJ1ZScKICAgIGFkZGl0aW9uYWxfaW5mbzoKY29sbGVjdG9yczoKICBmb3JtYXQ6IHRocmlmdAplbnJpY2g6CiAgdmVyc2lvbnM6CiAgICBzcGFya19lbnJpY2g6IDEuOS4wCiAgY29udGludWVfb25fdW5leHBlY3RlZF9lcnJvcjogZmFsc2UKICBvdXRwdXRfY29tcHJlc3Npb246IE5PTkUKc3RvcmFnZToKICB2ZXJzaW9uczoKICAgIHJkYl9sb2FkZXI6IDAuMTIuMAogICAgcmRiX3NocmVkZGVyOiAwLjEyLjAKICAgIGhhZG9vcF9lbGFzdGljc2VhcmNoOiAwLjEuMAptb25pdG9yaW5nOgogIHRhZ3M6IHt9CiAgbG9nZ2luZzoKICAgIGxldmVsOiBERUJVRwo= --resolver ewogICJzY2hlbWEiOiAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3MuaWdsdS9yZXNvbHZlci1jb25maWcvanNvbnNjaGVtYS8xLTAtMSIsCiAgImRhdGEiOiB7CiAgICAiY2FjaGVTaXplIjogNTAwLAogICAgInJlcG9zaXRvcmllcyI6IFsKICAgICAgewogICAgICAgICJuYW1lIjogIklnbHUgQ2VudHJhbCIsCiAgICAgICAgInByaW9yaXR5IjogMCwKICAgICAgICAidmVuZG9yUHJlZml4ZXMiOiBbICJjb20uc25vd3Bsb3dhbmFseXRpY3MiIF0sCiAgICAgICAgImNvbm5lY3Rpb24iOiB7CiAgICAgICAgICAiaHR0cCI6IHsKICAgICAgICAgICAgInVyaSI6ICJodHRwOi8vaWdsdWNlbnRyYWwuY29tIgogICAgICAgICAgfQogICAgICAgIH0KICAgICAgfQogICAgXQogIH0KfQo= --logkey s3://snowplowdataevents2/logs/rdb-loader/2017-10-10-15-07-40/e35454fc-3884-4b44-ad8c-f58ad883cc35 --target eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy5zdG9yYWdlL3JlZHNoaWZ0X2NvbmZpZy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiQVdTIFJlZHNoaWZ0IGVucmljaGVkIGV2ZW50cyBzdG9yYWdlIiwiaG9zdCI6InNub3dwbG93LmN6ZTBmeXVhZ3Y0eC51cy1lYXN0LTEucmVkc2hpZnQuYW1hem9uYXdzLmNvbSIsImRhdGFiYXNlIjoidW5pbG9nIiwicG9ydCI6NTQzOSwic3NsTW9kZSI6IkRJU0FCTEUiLCJ1c2VybmFtZSI6InVuaWxvZyIsInBhc3N3b3JkIjoiVW5pbG9nMTIzIiwic2NoZW1hIjoiYXRvbWljIiwibWF4RXJyb3IiOjEsImNvbXBSb3dzIjoyMDAwMCwicHVycG9zZSI6IkVOUklDSEVEX0VWRU5UUyJ9fQ== --skip analyze'
INFO Environment:
  PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
  LESS_TERMCAP_md=[01;38;5;208m
  LESS_TERMCAP_me=[0m
  HISTCONTROL=ignoredups
  LESS_TERMCAP_mb=[01;31m
  AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
  UPSTART_JOB=rc
  LESS_TERMCAP_se=[0m
  HISTSIZE=1000
  HADOOP_ROOT_LOGGER=INFO,DRFA
  JAVA_HOME=/etc/alternatives/jre
  AWS_DEFAULT_REGION=us-east-1
  AWS_ELB_HOME=/opt/aws/apitools/elb
  LESS_TERMCAP_us=[04;38;5;111m
  EC2_HOME=/opt/aws/apitools/ec2
  TERM=linux
  XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
  runlevel=3
  LANG=en_US.UTF-8
  AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
  MAIL=/var/spool/mail/hadoop
  LESS_TERMCAP_ue=[0m
  LOGNAME=hadoop
  PWD=/
  LANGSH_SOURCED=1
  HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-ATGSV39Q982U/tmp
  _=/etc/alternatives/jre/bin/java
  CONSOLETYPE=serial
  RUNLEVEL=3
  LESSOPEN=||/usr/bin/lesspipe.sh %s
  previous=N
  UPSTART_EVENTS=runlevel
  AWS_PATH=/opt/aws
  USER=hadoop
  UPSTART_INSTANCE=
  PREVLEVEL=N
  HADOOP_LOGFILE=syslog
  PYTHON_INSTALL_LAYOUT=amzn
  HOSTNAME=ip-172-31-7-173
  NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
  HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-ATGSV39Q982U
  EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
  SHLVL=5
  HOME=/home/hadoop
  HADOOP_IDENT_STRING=hadoop
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-ATGSV39Q982U/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-ATGSV39Q982U/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-ATGSV39Q982U
INFO ProcessRunner started child process 12703 :
hadoop   12703  3199  0 15:27 ?        00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /mnt/var/lib/hadoop/steps/s-ATGSV39Q982U/snowplow-rdb-loader-0.12.0.jar --config LS0tCmF3czoKICBhY2Nlc3Nfa2V5X2lkOiAnJwogIHNlY3JldF9hY2Nlc3Nfa2V5OiAnJwogIHJlZ2lvbjogdXMtZWFzdC0xCiAgczM6CiAgICByZWdpb246IHVzLWVhc3QtMQogICAgYnVja2V0czoKICAgICAgYXNzZXRzOiBzMzovL3Nub3dwbG93LWhvc3RlZC1hc3NldHMvCiAgICAgIGpzb25wYXRoX2Fzc2V0czoKICAgICAgbG9nOiBzMzovL3Nub3dwbG93ZGF0YWV2ZW50czIvbG9ncy8KICAgICAgcmF3OgogICAgICAgIGluOgogICAgICAgIC0gczM6Ly9zbm93cGxvd2RhdGFldmVudHMyLwogICAgICAgIHByb2Nlc3Npbmc6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9yYXcvcHJvY2Vzc2luZy8KICAgICAgICBhcmNoaXZlOiBzMzovL3Nub3dwbG93ZGF0YWV2ZW50czIvcmF3L2FyY2hpdmUvCiAgICAgIGVucmljaGVkOgogICAgICAgIGdvb2Q6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9lbnJpY2hlZC9nb29kLwogICAgICAgIGJhZDogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL2VucmljaGVkL2JhZC8KICAgICAgICBlcnJvcnM6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9lbnJpY2hlZC9lcnJvcnMvCiAgICAgICAgYXJjaGl2ZTogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL2VucmljaGVkL2FyY2hpdmUvCiAgICAgIHNocmVkZGVkOgogICAgICAgIGdvb2Q6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9zaHJlZGRlZC9nb29kLwogICAgICAgIGJhZDogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL3NocmVkZGVkL2JhZC8KICAgICAgICBlcnJvcnM6IHMzOi8vc25vd3Bsb3dkYXRhZXZlbnRzMi9zaHJlZGRlZC9lcnJvcnMvCiAgICAgICAgYXJjaGl2ZTogczM6Ly9zbm93cGxvd2RhdGFldmVudHMyL3NocmVkZGVkL2FyY2hpdmUvCiAgZW1yOgogICAgYW1pX3ZlcnNpb246IDUuNS4wCiAgICByZWdpb246IHVzLWVhc3QtMQogICAgam9iZmxvd19yb2xlOiBFTVJfRUMyX0RlZmF1bHRSb2xlCiAgICBzZXJ2aWNlX3JvbGU6IEVNUl9EZWZhdWx0Um9sZQogICAgcGxhY2VtZW50OiB1cy1lYXN0LTFhCiAgICBlYzJfc3VibmV0X2lkOgogICAgZWMyX2tleV9uYW1lOiBTbm93cGxvd2tleXBhaXIKICAgIGJvb3RzdHJhcDogW10KICAgIHNvZnR3YXJlOgogICAgICBoYmFzZToKICAgICAgbGluZ3VhbDoKICAgIGpvYmZsb3c6CiAgICAgIGpvYl9uYW1lOiBTbm93cGxvdyBFVEwKICAgICAgbWFzdGVyX2luc3RhbmNlX3R5cGU6IG0yLjR4bGFyZ2UKICAgICAgY29yZV9pbnN0YW5jZV9jb3VudDogMgogICAgICBjb3JlX2luc3RhbmNlX3R5cGU6IG0yLjR4bGFyZ2UKICAgICAgY29yZV9pbnN0YW5jZV9lYnM6CiAgICAgICAgdm9sdW1lX3NpemU6IDEwMAogICAgICAgIHZvbHVtZV90eXBlOiBncDIKICAgICAgICB2b2x1bWVfaW9wczogNDAwCiAgICAgICAgZWJzX29wdGltaXplZDogZmFsc2UKICAgICAgdGFza19pbnN0YW5jZV9jb3VudDogMAogICAgICB0YXNrX2luc3RhbmNlX3R5cGU6IG0yLjR4bGFyZ2UKICAgICAgdGFza19pbnN0YW5jZV9iaWQ6IDAuMDE1CiAgICBib290c3RyYXBfZmFpbHVyZV90cmllczogMwogICAgY29uZmlndXJhdGlvbjoKICAgICAgeWFybi1zaXRlOgogICAgICAgIHlhcm4ucmVzb3VyY2VtYW5hZ2VyLmFtLm1heC1hdHRlbXB0czogJzEnCiAgICAgIHNwYXJrOgogICAgICAgIG1heGltaXplUmVzb3VyY2VBbGxvY2F0aW9uOiAndHJ1ZScKICAgIGFkZGl0aW9uYWxfaW5mbzoKY29sbGVjdG9yczoKICBmb3JtYXQ6IHRocmlmdAplbnJpY2g6CiAgdmVyc2lvbnM6CiAgICBzcGFya19lbnJpY2g6IDEuOS4wCiAgY29udGludWVfb25fdW5leHBlY3RlZF9lcnJvcjogZmFsc2UKICBvdXRwdXRfY29tcHJlc3Npb246IE5PTkUKc3RvcmFnZToKICB2ZXJzaW9uczoKICAgIHJkYl9sb2FkZXI6IDAuMTIuMAogICAgcmRiX3NocmVkZGVyOiAwLjEyLjAKICAgIGhhZG9vcF9lbGFzdGljc2VhcmNoOiAwLjEuMAptb25pdG9yaW5nOgogIHRhZ3M6IHt9CiAgbG9nZ2luZzoKICAgIGxldmVsOiBERUJVRwo= --resolver ewogICJzY2hlbWEiOiAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3MuaWdsdS9yZXNvbHZlci1jb25maWcvanNvbnNjaGVtYS8xLTAtMSIsCiAgImRhdGEiOiB7CiAgICAiY2FjaGVTaXplIjogNTAwLAogICAgInJlcG9zaXRvcmllcyI6IFsKICAgICAgewogICAgICAgICJuYW1lIjogIklnbHUgQ2VudHJhbCIsCiAgICAgICAgInByaW9yaXR5IjogMCwKICAgICAgICAidmVuZG9yUHJlZml4ZXMiOiBbICJjb20uc25vd3Bsb3dhbmFseXRpY3MiIF0sCiAgICAgICAgImNvbm5lY3Rpb24iOiB7CiAgICAgICAgICAiaHR0cCI6IHsKICAgICAgICAgICAgInVyaSI6ICJodHRwOi8vaWdsdWNlbnRyYWwuY29tIgogICAgICAgICAgfQogICAgICAgIH0KICAgICAgfQogICAgXQogIH0KfQo= --logkey s3://snowplowdataevents2/logs/rdb-loader/2017-10-10-15-07-40/e35454fc-3884-4b44-ad8c-f58ad883cc35 --target eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy5zdG9yYWdlL3JlZHNoaWZ0X2NvbmZpZy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiQVdTIFJlZHNoaWZ0IGVucmljaGVkIGV2ZW50cyBzdG9yYWdlIiwiaG9zdCI6InNub3dwbG93LmN6ZTBmeXVhZ3Y0eC51cy1lYXN0LTEucmVkc2hpZnQuYW1hem9uYXdzLmNvbSIsImRhdGFiYXNlIjoidW5pbG9nIiwicG9ydCI6NTQzOSwic3NsTW9kZSI6IkRJU0FCTEUiLCJ1c2VybmFtZSI6InVuaWxvZyIsInBhc3N3b3JkIjoiVW5pbG9nMTIzIiwic2NoZW1hIjoiYXRvbWljIiwibWF4RXJyb3IiOjEsImNvbXBSb3dzIjoyMDAwMCwicHVycG9zZSI6IkVOUklDSEVEX0VWRU5UUyJ9fQ== --skip analyze
2017-10-10T15:27:54.902Z INFO HadoopJarStepRunner.Runner: startRun() called for s-ATGSV39Q982U Child Pid: 12703
INFO Synchronously wait child process to complete : hadoop jar /mnt/var/lib/hadoop/steps/s-ATGSV39Q...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /mnt/var/lib/hadoop/steps/s-ATGSV39Q...
INFO total process run time: 16 seconds
2017-10-10T15:28:08.984Z INFO Step created jobs: 
2017-10-10T15:28:08.984Z WARN Step failed with exitCode 1 and took 16 seconds

sorry that was misconceived.

anton · October 11, 2017, 9:58am

@sandesh, error in stderr tells us that RDB Loader cannot decode roleArn field from your storage target configuration. It should be a string looking like arn:aws:iam::719197435995:role/RedshiftLoadRole (note full ARN path). Here’s explanation on how to retrieve it.

anton · October 11, 2017, 10:57am

Hey @sandesh, you also have to change schema in your redshift.json to iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/2-0-0. RDB Loader doesn’t have access to your credentials (unlike its predecessor which used 1-0-0 and credentials from config.yml) and therefore requires access though IAM Role.

sandesh · October 11, 2017, 12:26pm

hey @anton thanks for your valuable input, i follow the things what you have mentioned and will let you the status.

sandesh · October 11, 2017, 3:36pm

Hey @anton i re run the process from java-tracker to redshift database.
All the steps completed successfully but in final step i am getting below error.
Note: i have done the changes you mentioned.

orage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze
D, [2017-10-11T15:17:40.265000 #23466] DEBUG -- : Initializing EMR jobflow
D, [2017-10-11T15:17:57.104000 #23466] DEBUG -- : EMR jobflow j-1EG6ESIJ9NXUN started, waiting for jobflow to complete...
I, [2017-10-11T15:29:59.973000 #23466]  INFO -- : No RDB Loader logs
F, [2017-10-11T15:30:00.315000 #23466] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-1EG6ESIJ9NXUN failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-11 15:23:36 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw s3://snowplowdataevents2/ -> Raw Staging S3: COMPLETED ~ 00:01:48 [2017-10-11 15:23:38 +0000 - 2017-10-11 15:25:26 +0000]
 - 2. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:01:40 [2017-10-11 15:25:28 +0000 - 2017-10-11 15:27:08 +0000]
 - 3. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:02 [2017-10-11 15:27:10 +0000 - 2017-10-11 15:28:12 +0000]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-11 15:28:14 +0000 - 2017-10-11 15:28:20 +0000]
 - 5. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 9. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 10. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 11. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 12. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'

And in logs i checked the step failure file below is the stderr.gz file of the particular step:

Exception in thread "main" java.lang.RuntimeException: Error running job
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927)
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-1-167.ec2.internal:8020/tmp/cf2dcfd4-57f3-4d56-9afe-f581520a755d/files
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901)
	... 10 more

And the syslog.gz file is as below:

2017-10-11 15:28:15,018 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Running with args: --src hdfs:///local/snowplow/enriched-events/ --dest s3://snowplowdataevents2/enriched/good/run=2017-10-11-15-17-40/ --srcPattern .*part-.* --s3Endpoint s3.amazonaws.com 
2017-10-11 15:28:15,345 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): S3DistCp args: --src hdfs:///local/snowplow/enriched-events/ --dest s3://snowplowdataevents2/enriched/good/run=2017-10-11-15-17-40/ --srcPattern .*part-.* --s3Endpoint s3.amazonaws.com 
2017-10-11 15:28:15,366 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Using output path 'hdfs:/tmp/cf2dcfd4-57f3-4d56-9afe-f581520a755d/output'
2017-10-11 15:28:16,499 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Created 0 files to copy 0 files 
2017-10-11 15:28:19,334 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Reducer number: 47
2017-10-11 15:28:19,417 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-1-167.ec2.internal/172.31.1.167:8032
2017-10-11 15:28:19,900 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1507735266869_0004
2017-10-11 15:28:19,902 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Try to recursively delete hdfs:/tmp/cf2dcfd4-57f3-4d56-9afe-f581520a755d/tempspace

Below is the controller.gz file:

  2017-10-11T15:28:12.512Z INFO Ensure step 4 jar file /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar
2017-10-11T15:28:12.512Z INFO StepRunner: Created Runner for step 4
INFO startExec 'hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src hdfs:///local/snowplow/enriched-events/ --dest s3://snowplowdataevents2/enriched/good/run=2017-10-11-15-17-40/ --srcPattern .*part-.* --s3Endpoint s3.amazonaws.com'
INFO Environment:
  PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
  LESS_TERMCAP_md=[01;38;5;208m
  LESS_TERMCAP_me=[0m
  HISTCONTROL=ignoredups
  LESS_TERMCAP_mb=[01;31m
  AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
  UPSTART_JOB=rc
  LESS_TERMCAP_se=[0m
  HISTSIZE=1000
  HADOOP_ROOT_LOGGER=INFO,DRFA
  JAVA_HOME=/etc/alternatives/jre
  AWS_DEFAULT_REGION=us-east-1
  AWS_ELB_HOME=/opt/aws/apitools/elb
  LESS_TERMCAP_us=[04;38;5;111m
  EC2_HOME=/opt/aws/apitools/ec2
  TERM=linux
  XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
  runlevel=3
  LANG=en_US.UTF-8
  AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
  MAIL=/var/spool/mail/hadoop
  LESS_TERMCAP_ue=[0m
  LOGNAME=hadoop
  PWD=/
  LANGSH_SOURCED=1
  HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-1U0Z3AUWWVL3Z/tmp
  _=/etc/alternatives/jre/bin/java
  CONSOLETYPE=serial
  RUNLEVEL=3
  LESSOPEN=||/usr/bin/lesspipe.sh %s
  previous=N
  UPSTART_EVENTS=runlevel
  AWS_PATH=/opt/aws
  USER=hadoop
  UPSTART_INSTANCE=
  PREVLEVEL=N
  HADOOP_LOGFILE=syslog
  PYTHON_INSTALL_LAYOUT=amzn
  HOSTNAME=ip-172-31-1-167
  NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
  HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-1U0Z3AUWWVL3Z
  EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
  SHLVL=5
  HOME=/home/hadoop
  HADOOP_IDENT_STRING=hadoop
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-1U0Z3AUWWVL3Z/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-1U0Z3AUWWVL3Z/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-1U0Z3AUWWVL3Z
INFO ProcessRunner started child process 9252 :
hadoop    9252  3178  0 15:28 ?        00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src hdfs:///local/snowplow/enriched-events/ --dest s3://snowplowdataevents2/enriched/good/run=2017-10-11-15-17-40/ --srcPattern .*part-.* --s3Endpoint s3.amazonaws.com
2017-10-11T15:28:16.520Z INFO HadoopJarStepRunner.Runner: startRun() called for s-1U0Z3AUWWVL3Z Child Pid: 9252
INFO Synchronously wait child process to complete : hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3...
INFO total process run time: 6 seconds
2017-10-11T15:28:20.602Z INFO Step created jobs: 
2017-10-11T15:28:20.602Z WARN Step failed with exitCode 1 and took 6 seconds

Please help me out again stuck in the old issue

anton · October 11, 2017, 3:52pm

@sandesh I’m quite surprised error now happens in steps where job was successful previously. Can you confirm that:

You’ve cleaned your enriched good bucket and have no data from previous loads before running the job
Have raw data to process
Didn’t introduce any more changes in configuration apart from what I advised

Schanithems · October 12, 2017, 2:06pm

Well quite difficult, i have to think a little bit, never had this before! Have you solved yet?
I really love your forum, thank God i found something else where to spend time, especially now that i had health problems and i had to spend lots of time searching for pharmacy reviews

sandesh · October 12, 2017, 2:33pm

Hey @anton i have followed the steps which you told.
1. I have cleared the enriched good bucket data.
2. have raw data to process.
3.I have not introduce any more changes.

I have tried all possibilities, but still i am getting same error in 4th stage while running for storage.
below is the error:

D, [2017-10-12T13:51:25.021000 #31190] DEBUG -- : Initializing EMR jobflow
D, [2017-10-12T13:51:45.766000 #31190] DEBUG -- : EMR jobflow j-3RBH62IKO8UFV started, waiting for jobflow to complete...
I, [2017-10-12T14:03:48.935000 #31190]  INFO -- : No RDB Loader logs
F, [2017-10-12T14:03:49.276000 #31190] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-3RBH62IKO8UFV failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-12 13:57:46 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw s3://snowplowdataevents2/ -> Raw Staging S3: COMPLETED ~ 00:02:16 [2017-10-12 13:57:47 +0000 - 2017-10-12 14:00:04 +0000]
 - 2. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:02:00 [2017-10-12 14:00:06 +0000 - 2017-10-12 14:02:06 +0000]
 - 3. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:02 [2017-10-12 14:02:08 +0000 - 2017-10-12 14:03:10 +0000]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-12 14:03:12 +0000 - 2017-10-12 14:03:18 +0000]
 - 5. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 9. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 10. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 11. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 12. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Please check this and give me the possible solutions.
note: all the other sections are working properly except storage(database- redshift)

sandesh · October 12, 2017, 2:34pm

hey @Schanithems still getting the same error.

sandesh · October 13, 2017, 12:39pm

hey @anton please suggest me… i have stuck from 3 days…

pramod.niralakeri · June 4, 2021, 12:01am

were you able to fix this and found solution? I’m also facing same issue

Topic		Replies	Views
Failing in the 4th step process of storage every time.(Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED) AWS batch pipeline (Legacy)	2	1520	November 9, 2017
EmrEtlRunner::EmrExecutionError while storing the events in redshift database AWS batch pipeline (Legacy)	2	2439	October 16, 2017
EmrExecutionError - Enriched HDFS -> S3: FAILED Enrichment	7	1343	May 3, 2019
EmrEtlRunner::EmrExecutionError in the 3rd stage of the process AWS batch pipeline (Legacy)	4	2298	October 23, 2017
Cluster: Snowplow ETLTerminated with errorsShut down as step failed Duplicate	2	2455	October 10, 2017

Problem at S3 to HDFS S3DistCp step

Related topics