hi there!
I just completed a basic build of snowplow for testing and got it to run finally!
However a few days later I see this error on running the storage loader:
Unexpected error: Java::OrgPostgresqlUtil::PSQLException error executing COPY statements: BEGIN;
COPY atomic.events FROM 's3://anlytics-data/shredded/good/run=2017-01-30-16-26-15/atomic-events' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' REGION AS 'us-east-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;
COPY atomic.com_snowplowanalytics_snowplow_duplicate_1 FROM 's3://anlytics-data/shredded/good/run=2017-01-30-16-26-15/com.snowplowanalytics.snowplow/duplicate/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;
COMMIT;: ERROR: Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_duplicate_1
uri:classloader:/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:89:in `load_events_and_shredded_types'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/storage-loader/bin/snowplow-storage-loader:54:in `block in (root)'
uri:classloader:/storage-loader/bin/snowplow-storage-loader:51:in `<main>'
org/jruby/RubyKernel.java:973:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:955:in `require'
uri:classloader:/META-INF/main.rb:1:in `(root)'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'
I saw a few posts similar to this error and realize that this is due to missing tables based on the iglu definitions needed. However, its not clear which sql is needed to be added into the redshift db? I have not modified or added any custom or self describing schemas as yet - just using standard iglu schemas as per the resolver.conf file:
{
"schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
"data": {
"cacheSize": 500,
"repositories": [
{
"name": "Iglu Central",
"priority": 0,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "http://iglucentral.com"
}
}
}
]
}
}
Is there a guide to which sql files are needed for the standard setup? And how to relate the various sqls with custom schemas? By finding the duplicate_1 sql in the iglu repo, I was able to make my StorageLoader work, but its possible that it could break at a later point when searching for another table not added into redshift from the iglu repo.
Thanks very much!