Hi,
We are trying to capture mailer activities like mail sent/open/click. For mail open we are using pixel tracker and for mail clicking we are trying to use uri_redirect method.
We are tracking these events using our own schema so they will get tracked as unstruct events. We set up JSON schema, jsonpath file and also created the redshift table using schema-guru.
After firing few events when we ran EmrEtlRunner, we are getting error from storage loader. Below is the error :
--------------------------------------------------------------------------------------------------------------------
**Loading Snowplow events and shredded types into Shaadi Redshift database (Redshift cluster)...**
**Unexpected error: Java::Default::PSQLException error executing COPY statements: BEGIN;**
**COPY atomic.events FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/atomic-events' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' REGION AS 'us-east-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_communication_tracking_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/communication_tracking/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/communication_tracking_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_help_me_write_this_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/help_me_write_this/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/help_me_write_this_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_client_session_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/client_session/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/client_session_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_geolocation_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/geolocation_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/geolocation_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_mobile_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/mobile_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_screen_view_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/screen_view/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/screen_view_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_ua_parser_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/ua_parser_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_uri_redirect_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/uri_redirect/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/uri_redirect_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_web_page_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/web_page/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/web_page_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.org_w3_performance_timing_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/org.w3/PerformanceTiming/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/org.w3/performance_timing_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COMMIT;: ERROR: Manifest file is not in correct json format**
** Detail: **
** -----------------------------------------------**
** error: Manifest file is not in correct json format**
** code: 8001**
** context: Manifest file location = s3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/communication_tracking_1.json**
** query: 8086302**
** location: s3_utility.cpp:293**
** process: padbmaster [pid=20258]**
** -----------------------------------------------**
**/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:88:in `load_events_and_shredded_types'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:54:in `(root)'**
**org/jruby/RubyArray.java:1613:in `each'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:51:in `(root)'**
**org/jruby/RubyKernel.java:1091:in `load'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**org/jruby/RubyKernel.java:1072:in `require'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**/tmp/jruby8883550174880908063extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'**
**Loading Snowplow events and shredded types into Shaadi Redshift database (Redshift cluster)...**
**Unexpected error: Java::Default::PSQLException error executing COPY statements: BEGIN;**
**COPY atomic.events FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/atomic-events' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' REGION AS 'us-east-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_communication_tracking_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/communication_tracking/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/communication_tracking_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_help_me_write_this_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/help_me_write_this/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/help_me_write_this_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_client_session_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/client_session/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/client_session_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_geolocation_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/geolocation_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/geolocation_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_mobile_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/mobile_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_screen_view_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/screen_view/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/screen_view_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_ua_parser_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/ua_parser_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_uri_redirect_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/uri_redirect/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/uri_redirect_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_web_page_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/web_page/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/web_page_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.org_w3_performance_timing_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/org.w3/PerformanceTiming/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/org.w3/performance_timing_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COMMIT;: ERROR: Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_uri_redirect_1**
**/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:88:in `load_events_and_shredded_types'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:54:in `(root)'**
**org/jruby/RubyArray.java:1613:in `each'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:51:in `(root)'**
**org/jruby/RubyKernel.java:1091:in `load'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**org/jruby/RubyKernel.java:1072:in `require'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**/tmp/jruby6503417955291522470extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'**
**Loading Snowplow events and shredded types into Shaadi Redshift database (Redshift cluster)...**
**Unexpected error: Java::Default::PSQLException error executing ANALYZE statements: BEGIN;**
**ANALYZE atomic.events;**
**ANALYZE atomic.com_shaadi_communication_tracking_1;**
**ANALYZE atomic.com_shaadi_help_me_write_this_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_client_session_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_geolocation_context_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_mobile_context_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_screen_view_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_ua_parser_context_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_uri_redirect_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_web_page_1;**
**ANALYZE atomic.org_w3_performance_timing_1;**
**COMMIT;: ERROR: skipping "com_shaadi_communication_tracking_1" --- only table or database owner can analyze it**
**/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:111:in `load_events_and_shredded_types'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:54:in `(root)'**
**org/jruby/RubyArray.java:1613:in `each'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:51:in `(root)'**
**org/jruby/RubyKernel.java:1091:in `load'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**org/jruby/RubyKernel.java:1072:in `require'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**/tmp/jruby1139177695818426949extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'**
----------------------------------------------------------------------------------------------------------------------
Please help us with this.