Trying to use StorageLoader with Stream Enrich without AWS

sandesh · July 27, 2017, 3:11pm

Hi All,

i have configured javascript tracker–> scala stream collector → Stream enrich → PostgreSQL
But while running storage loader with below command

./snowplow-storage-loader --config config/config.yml --resolver config/resolver.json --targets config/targets/ --skip analyze

i am getting below error.

   >  Unexpected error: undefined method `[]=' for nil:NilClass
> /home/hadoop/snowplow/4-storage/storage-loader/lib/snowplow-storage-loader/config.rb:56:in `get_config'
> storage-loader/bin/snowplow-storage-loader:31:in `<main>'
> org/jruby/RubyKernel.java:977:in `load'
> uri:classloader:/META-INF/main.rb:1:in `<main>'
> org/jruby/RubyKernel.java:959:in `require'
> uri:classloader:/META-INF/main.rb:1:in `(root)'
> uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Below is my configuration file(config.yml) for the storage loader(postgreSQL)

s3:
region: eu-west-1 # S3 bucket region
buckets:
in: ADD HERE
archive: ADD HERE
download:
folder: /home/hadoop/snowplow/4-storage/ # Postgres-only config option. Where to store the downloaded files
targets:
   - :name: "PostgreSQL enriched events storage"
     :type: postgres
     :host: localhost # Hostname of database server
     :database: snowplow # Name of database
     :port: 5432 # Default Postgres port
     :table: atomic.events
     :username: power_user
     :password: hadoop
     :maxerror: # Not required for Postgres

Below is the my resolver.json file

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      }
    ]
  }
}

Please help me to get config.yml file to run storage loader.
if anyone has already done please share me. I am struggling from past 3 days.

Thanks and Regards
Sandesh P

ihor · July 27, 2017, 7:51pm

@sandesh,

What version of the StorageLoader are you running?

Also, how did you implement this link Stream enrich --> PostgreSQL? Are you using Kinesis S3 to sink the events to S3? Your configuration doesn’t reflect where the enriched files will be taken from to load to Postgres.

sandesh · July 28, 2017, 6:06am

Hey @ihor thanks for the response…

We are using version R88 for the storage loader.
For stream enrich we are using below configuration things.
source = "stdin"
sink = "stdouterr"
we are investigating how to implement Stream enrich --> PostgreSQL add the events to the PostgreSQL database.
We are using sink = "stdouterr"
We didnt get any example regarding configuration of postgreSQL(config.yml), so please help me to load stream enrich data to postgreSQl.
Suggest us config.yml file inorder to run the storage loader.

alex · July 28, 2017, 9:10pm

Hi @sandesh - this isn’t a supported topology for Snowplow currently.

There is no way of wiring Stream Enrich up to Postgres on-premise without using AWS currently. You are missing a whole component in the middle - Spark Shred, and this currently only runs on EMR.

This may change in the future (particularly in Snowplow Mini), but it’s not something we can help with at this time.

ihor · July 28, 2017, 9:21pm

@sandesh,

Here’s the “architecture” you should be using (any of the two should do):

Enrichment done in Kinesis Enrich:

... -> Stream enrich -> Kinesis S3 -> S3 -> EmrEtlRunner (shredding) -> PostgreSQL

Enrichment done in EMR

... -> Stream raw -> Kinesis S3 -> S3 -> EmrEtlRunner (enrich + shredding) -> PostgreSQL

A sample of the config.yml for R88 is here. The database target JSON configuration file is here.

Note the “targets” section was removed from YAML configuration in R88 and replaced with JSON configuration file.

You can refer to Lambda architecture to clarify this setup.

Topic		Replies	Views
Error while running postgresql storageloader Storage targets	4	1628	July 28, 2017
How to run Storage Loader in PostgreSQL database Data store sources	1	2788	July 25, 2017
StorageLoader isn't working Storage targets	4	1820	March 27, 2018
Error using StorageLoader to load data into Redshift Storage targets	7	3993	July 19, 2017
No data loaded in postgres, no errors either Storage targets	3	2339	April 12, 2017

Trying to use StorageLoader with Stream Enrich without AWS

Related topics