Step 4: setting up alternative data stores » Setup PostgreSQL.
-
In my localhost I had successfully installed PostgreSQL and successfully created & configured “atomic”.“events” table and configured resolver.json.
-
I tried to run in \4-storage\storage-loader\deploy directory with unzipping two files
ie., snowplow-storage-loader & snowplow-emr-etl-runner from snowplow_emr_r88_angkor_wat.zip with config.yml configured?. Even though I am not having Amazon Awz Account.
When I tried to run the below command:
./snowplow-storage-loader --config config/config.yml --resolver config/resolver.json --targets config/targets/ --skip analyze
I’am getting the following error:
Unexpected error: (<unknown>): expected <block end>, but found BlockMappingStart while parsing a block mapping at line 70 column 4
org/jruby/ext/psych/PsychParser.java:219:in `parse'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/psych.rb:376:in `parse_stream'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/psych.rb:324:in `parse'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/psych.rb:251:in `load'
/home/hadoop/snowplow/4-storage/storage-loader/lib/snowplow-storage-loader/config.rb:52:in `get_config'
storage-loader/bin/snowplow-storage-loader:31:in `<main>'
org/jruby/RubyKernel.java:977:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:959:in `require'
uri:classloader:/META-INF/main.rb:1:in `(root)'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'
-
How should I need to configure config.yml file with postgreSQL installed. Below is my config.yml details
aws: \# Credentials can be hardcoded or set in environment variables access_key_id: xxxx #<%= ENV['AWS_SNOWPLOW_ACCESS_KEY'] %> secret_access_key: yyy #<%= ENV['AWS_SNOWPLOW_SECRET_KEY'] %> s3: region: us-east-1 buckets: assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here log: ABCXYZ.s3-website-ap-southeast-1.amazonaws.com/logs #s3://emretlrunner/log raw: in: # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below - s3://kinesis-sink # e.g. s3://my-old-collector-bucket - s3://elasticbeanstalk-us-east-1-XXX/resources/environments/logs/publish/e-XXX #- s3:s3-ap-southeast-1.amazonaws.com/elasticbeanstalk-ap-southeast-1-123/resources/environments/logs/publish/e-abc12345y6 #- ADD HERE # e.g. s3://my-new-collector-bucket processing: s3://my-snowplow-bucket/processing archive: s3://my-snowplow-bucket/archive #"s3://ABCXYZ.s3-website-ap-southeast-1.amazonaws.com/archive" # e.g. s3://my-archive-bucket/raw enriched: good: s3://my-snowplow-bucket/out # e.g. s3://my-out-bucket/enriched/good bad: s3://my-snowplow-bucket/bad # e.g. s3://my-out-bucket/enriched/bad errors: s3://my-snowplow-bucket/errors # Leave blank unless :continue_on_unexpected_error: set to true below archive: s3://my-snowplow-bucket/storage-archive # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched shredded: good: s3://my-snowplow-bucket/out # e.g. s3://my-out-bucket/shredded/good bad: s3://my-snowplow-bucket/bad # e.g. s3://my-out-bucket/shredded/bad errors: s3://my-snowplow-bucket/errors # Leave blank unless :continue_on_unexpected_error: set to true below archive: ADD HERE # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded emr: ami_version: 5.5.0 region: us-east-1 # Always set this jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles service_role: EMR_DefaultRole # Created using $ aws emr create-default-roles placement: us-east-1b # Set this if not running in VPC. Leave blank otherwise ec2_subnet_id: subnet-07924c62 # Set this if running in VPC. Leave blank otherwise ec2_key_name: clerp #emr-keypair bootstrap: [] # Set this to specify custom boostrap actions. Leave empty otherwise software: hbase: 0.92.0 # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise. lingual: '1.1' # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise. # Adjust your Hadoop cluster below jobflow: job_name: Snowplow ETL # Give your job a name master_instance_type: m1.medium core_instance_count: 2 core_instance_type: m1.medium core_instance_ebs: # Optional. Attach an EBS volume to each core instance. volume_size: 100 # Gigabytes volume_type: "gp2" volume_iops: 400 # Optional. Will only be used if volume_type is "io1" ebs_optimized: false # Optional. Will default to true task_instance_count: 0 # Increase to use spot instances task_instance_type: m1.medium task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures additional_info: # Optional JSON string for selecting additional features collectors: format: clj-tomcat #cloudfront # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events enrich: versions: spark_enrich: 1.9.0 # Version of the Spark Enrichment process continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP storage: versions: rdb_shredder: 0.12.0 # Version of the Spark Shredding process hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process download: folder: s3://ABCXYZ.s3-website-ap-southeast-1.amazonaws.com/output #/var/storageloader # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift targets: - name: My PostgreSQL database type: postgres host: localhost database: snowplow port: 5432 table: atomic.events username: power_user password: xxxx maxerror: comprows: ssl_mode: disable monitoring: tags: {} # Name-value pairs describing this job logging: level: DEBUG # You can optionally switch to INFO for production snowplow: method: get app_id: WebSite # e.g. snowplow collector: snowp-env.ap-southeast-1.elasticbeanstalk.com # e.g. d3rkrsqld9gmqf.cloudfront.net
-
Here is my resolver.json file:
{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Snowplow PostgreSQL storage configuration", "self": { "vendor": "com.snowplowanalytics.snowplow.storage", "name": "postgresql_config", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "name": { "type": "PostgreSQL enriched events storage" }, "host": { "type": "localhost" }, "database": { "type": "snowplow" }, "port": { "type": "integer", "minimum": 1, "maximum": 65535 }, "sslMode": { "type": "DISABLE", "enum": ["DISABLE", "REQUIRE", "VERIFY_CA", "VERIFY_FULL"] }, "schema": { "type": "atomic" }, "username": { "type": "power_user" }, "password": { "type": "hadoop" }, "purpose": { "type": "ENRICHED_EVENTS", "enum": ["ENRICHED_EVENTS"] } }, "additionalProperties": false, "required": ["name", "host", "database", "port", "sslMode", "schema", "username","password", "purpose"] }
Kindly help me to resolve this issue.
Thanking You,
Nishanth