StorageLoader isn't working

mhub · July 17, 2017, 3:22pm

Can someone help me to move our data in postgres database?

We got the following error message:
EmrEtlRunner returned 4, directory is not empty. StorageLoader not run

We tried to solve it with --skip commands, we have deleted all testing data and we created new buckets, directories but we have still the same issue. We can not move our data to postgres.

Snowplow ETL process should be fine:
Elasticity S3DistCp Step: Raw S3 Staging -> S3 Archive Completed 2017-07-17 17:04 (UTC+2) 1 minute
Elasticity S3DistCp Step: Shredded HDFS -> S3 Completed 2017-07-17 17:02 (UTC+2) 1 minute
Elasticity Spark Step: Shred Enriched Events Completed 2017-07-17 17:00 (UTC+2) 2 minutes
Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3 Completed 2017-07-17 16:59 (UTC+2) 1 minute
Elasticity S3DistCp Step: Enriched HDFS -> S3 Completed 2017-07-17 16:57 (UTC+2) 1 minute
Elasticity Spark Step: Enrich Raw Events Completed 2017-07-17 16:54 (UTC+2) 2 minutes

our config:

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: ****
  secret_access_key: ****
  s3:
    region: eu-west-1
    buckets:
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets:  # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: s3n://mhublogs
      raw:
        in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
          - "s3n://elasticbeanstalk-eu-west-1-896554815027/resources/environments/logs/publish/e-****"         # e.g. s3://my-old-collector-bucket
        processing: s3n://mhublog-processing/processing
        archive: s3n://mhubarchive/raw    # e.g. s3://my-archive-bucket/raw
      enriched:
        good: s3://mhubenriched/good       # e.g. s3://my-out-bucket/enriched/good
        bad: s3://mhubenriched/bad        # e.g. s3://my-out-bucket/enriched/bad
        errors:     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://mhubarchive/enriched    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good: s3://mhubshredded/good       # e.g. s3://my-out-bucket/shredded/good
        bad: s3://mhubshredded/bad        # e.g. s3://my-out-bucket/shredded/bad
        errors:     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://mhubarchive/shredded    # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
    ami_version: 5.5.0
    region: eu-west-1        # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    placement: eu-west-1b     # Set this if not running in VPC. Leave blank otherwise
    ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: rabbit
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:                # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
    # Adjust your Hadoop cluster below
    jobflow:
      job_name: Snowplow ETL # Give your job a name
      master_instance_type: m1.medium
      core_instance_count: 2
      core_instance_type: m1.medium
      core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
        volume_size: 100    # Gigabytes
        volume_type: "gp2"
        volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
        ebs_optimized: false # Optional. Will default to true
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m1.medium
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
    additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: clj-tomcat # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Conne$
enrich:
  versions:
    spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  download:
    folder: # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
    level: DEBUG # You can optionally switch to INFO for production
  snowplow:
    method: get
    app_id: ADD HERE # e.g. snowplow
    collector: ****.eu-west-1.elasticbeanstalk.com #d3cxsus9b0a0pj.cloudfront.net # e.g. d3rkrsqld9gmqf.cloudfront.net

postgres.json:

{
    "schema": "iglu:com.snowplowanalytics.snowplow.storage/postgresql_config/jsonschema/1-0-0",
    "data": {
        "name": "PostgreSQL enriched events storage",
        "host": "localhost",
        "database": "snowplow",
        "port": 5432,
        "sslMode": "DISABLE",
        "username": "power_user",
        "password": "****",
        "schema": "atomic",
        "purpose": "ENRICHED_EVENTS"
    }
}

tclass · July 17, 2017, 4:47pm

you might have some data in one of these buckets, please check. If you want you can run the storageloader to import the data.

enriched:
good: s3://mhubenriched/good
bad: s3://mhubenriched/bad

shredded:
good: s3://mhubshredded/good
bad: s3://mhubshredded/bad

mhub · July 17, 2017, 5:18pm

I am using the same config.yml, you can see above. It seems it is wrong, but I can not see any problem.
If I am running storage loader I got the following error message:

./bin/snowplow-storage-loader --config config.yml --resolver /var/www/snowplow/3-enrich/config/iglu_resolver.json --targets /var/www/snowplow/4-storage/config/targets/

Unexpected error: no implicit conversion of nil into String
org/jruby/RubyFileTest.java:95:in directory?' org/jruby/RubyFileTest.java:87:indirectory?’
/var/www/snowplow/4-storage/storage-loader/lib/snowplow-storage-loader/config.rb:83:in get_config' ./bin/snowplow-storage-loader:31:in’

mhub · July 17, 2017, 9:12pm

I’ve changed the config file, now it is working. Thank you!

ganesh · March 27, 2018, 11:46am

What did you change in config file?

Topic		Replies	Views
No data loaded in postgres, no errors either Storage targets	3	2338	April 12, 2017
Storage Loader successful but not loading Redshift or Postgres DB Storage targets	4	2031	March 28, 2017
Failing in the 4th step process of storage every time.(Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED) AWS batch pipeline (Legacy)	2	1520	November 9, 2017
Storage Loader Failure table Storage targets	2	1582	November 24, 2016
Steps Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3	13	1238	January 17, 2020

StorageLoader isn't working

./bin/snowplow-storage-loader --config config.yml --resolver /var/www/snowplow/3-enrich/config/iglu_resolver.json --targets /var/www/snowplow/4-storage/config/targets/

Related topics