Exception in emr step of loading data in redshift

Dev · July 30, 2018, 5:27am

I have rdb loader and other required jars in my s3 bucket emr configurations are fine but sometimes such CNFE occurs and emr step fails…

I dont know why this happens sometimes and otherwise same job works fine. Please help…

Exception in thread "main" java.lang.NoClassDefFoundError: cats/FlatMap
	at com.snowplowanalytics.snowplow.rdbloader.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: cats.FlatMap
	at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Caused by: java.io.IOException: Input/output error
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at sun.misc.Resource.getBytes(Resource.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

anton · July 30, 2018, 9:40am

Hi @Dev, what version of RDB Loader are you using? Did you try to compile it yourself?

Dev · July 30, 2018, 10:06am

there many jars kept in assest bucket not sure which one is used by emr

anton · July 30, 2018, 1:44pm

It should be in your config.yml. Could you paste it here? (with credentials removed)

Dev · July 31, 2018, 3:51am

Hey, its rdb_loader: 0.14.0 and this jar is present in s3 bucket. Everyday I am getting some new CNFE for different things and when I restart job it runs successfully. Not getting the actual problem ?
Today I got following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/yaml/snakeyaml/representer/Representer
	at org.yaml.snakeyaml.Yaml.<init>(Yaml.java:64)
	at io.circe.yaml.parser.package$$anonfun$parseSingle$1.apply(package.scala:29)
	at io.circe.yaml.parser.package$$anonfun$parseSingle$1.apply(package.scala:29)
	at cats.syntax.EitherObjectOps$.catchNonFatal$extension(either.scala:267)
	at io.circe.yaml.parser.package$.parseSingle(package.scala:29)
	at io.circe.yaml.parser.package$.parse(package.scala:19)
	at io.circe.yaml.parser.package$.parse(package.scala:23)
	at com.snowplowanalytics.snowplow.rdbloader.config.SnowplowConfig$.parse(SnowplowConfig.scala:49)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$9.apply(CliConfig.scala:137)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$9.apply(CliConfig.scala:137)
	at cats.syntax.EitherOps$.flatMap$extension(either.scala:129)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$.transform(CliConfig.scala:137)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$parse$1.apply(CliConfig.scala:99)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$parse$1.apply(CliConfig.scala:99)
	at scala.Option.map(Option.scala:146)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$.parse(CliConfig.scala:99)
	at com.snowplowanalytics.snowplow.rdbloader.Main$.main(Main.scala:33)
	at com.snowplowanalytics.snowplow.rdbloader.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.representer.Representer
	at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 24 more
Caused by: java.io.IOException: Input/output error
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at sun.misc.Resource.getBytes(Resource.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

anton · July 31, 2018, 7:06am

@Dev, could you please paste your config.yml. It could help us to identify the problem.

Dev · July 31, 2018, 7:24am

aws:
   access_key_id: 
  secret_access_key: 
    region: 
    buckets:
      assets:  # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: 
      raw:
        in:                  # Multiple in buckets are permitted
        processing: 
        archive:     # e.g. s3://my-archive-bucket/in
      enriched:
        good:         # e.g. s3://my-out-bucket/enriched/good
        bad:             # e.g. s3://my-out-bucket/enriched/bad
        errors:       # Leave blank unless :continue_on_unexpected_error: set to true below
        archive:     # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good:          # e.g. s3://my-out-bucket/shredded/good
        bad:             # e.g. s3://my-out-bucket/shredded/bad
        errors:    # Leave blank unless :continue_on_unexpected_error: set to true below
        archive:   # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
    ami_version: 5.9.0      # Don't change this
    region:          # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    placement:     # Set this if not running in VPC. Leave blank otherwise
    ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: 
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:                # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
    # Adjust your Spark cluster below
    jobflow:
      job_name: Snowplow ETL PROD
      master_instance_type: m1.medium
      core_instance_count: 1
      core_instance_type:  m4.xlarge
      core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
        volume_size: 100    # Gigabytes
        volume_type: "gp2"
        volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
        ebs_optimized: false # Optional. Will default to true
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m4.xlarge
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
    configuration:
      yarn-site:
        yarn.resourcemanager.am.max-attempts: "1"
      spark:
        maximizeResourceAllocation: "true"
    additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: "thrift" # Or 'clj-tomcat' for the Clojure Collector, or 'thrift' for Thrift records, or 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs
enrich:
  versions:
    spark_enrich: 1.13.0 # Version of the Hadoop Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: GZIP # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
    rdb_shredder: 0.13.0        # Version of the Relational Database Shredding process
    rdb_loader: 0.14.0          # Version of the Relational Database Loader app
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
  #download:
    #folder: 
monitoring:
  tags:  # Name-value pairs describing this job
  logging:
    level: DEBUG # You can optionally switch to INFO for production
  snowplow:
    method: get
    app_id: snowplow # e.g. snowplow
    collector:

ganesh · July 31, 2018, 9:24am

@chetan please check maximum limit of resources for your AWS account. You might be exceeding the limit.(multiple jobs using resources)

Dev · July 31, 2018, 11:09am

I have checked limit of resouces and there are enough no of resources avalilable at that time and no other emr job was running.

Topic		Replies	Views
EMR intermittently fails at Loading S3 to Redshift AWS batch pipeline (Legacy)	11	4576	March 12, 2020
Problem at S3 to HDFS S3DistCp step AWS batch pipeline (Legacy)	19	7280	June 4, 2021
Uncaught exception: java.lang.ClassNotFoundException: com.snowplowanalytics.snowplow.shredder.batch.Main Storage targets	1	2389	April 26, 2021
ETL RDB Loader Error AWS batch pipeline (Legacy)	4	1519	February 10, 2018
Snowplow Event Recovery EMR Errors For engineers	6	2206	January 8, 2021

Exception in emr step of loading data in redshift

Related topics