$ ./sql-runner -playbook XXX-snowplow-sql/playbook/web-model.yml.tmpl
2017/01/23 10:10:09 EXECUTING 00-web-page-context (in step 00-web-page-context @ XXX-snowplow): /home/ubuntu/XXX-snowplow-sql/playbook/web-model/01-page-views/00-web-page-context.sql
2017/01/23 10:10:09 FAILURE: 00-web-page-context (step 00-web-page-context @ target XXX-snowplow), ERROR: ERROR #42P01 relation "atomic.com_snowplowanalytics_snowplow_web_page_1" does not exist (addr="XXX:5439")
2017/01/23 10:10:09
TARGET INITIALIZATION FAILURES:
QUERY FAILURES:
* Query 00-web-page-context /home/ubuntu/XXX-snowplow-sql/playbook/web-model/01-page-views/00-web-page-context.sql (in step 00-web-page-context @ target XXX-snowplow), ERROR:
- ERROR #42P01 relation "atomic.com_snowplowanalytics_snowplow_web_page_1" does not exist (addr="XXX:5439")
Why this table doesn’t exist? Is there a pre-setup step that is not defined in the Wiki?
The only reference I found about this table is here snowplow/iglu-central - what are those SQL’s for, I can’t find any documentation on their intended usage?
Okay so I actually ran all of the .sql files from iglu-central repository dependent on the web-model modeling. It doesn’t say explicitly in the documentation that if you want to use the web-model then you need some extra tables (or at least I didn’t find it).
For events that use self-describing schemas (as in the case of web_page_context) you’ll need to create the tables in Redshift using the DDLs provided in that iglu-central repository. Each self-describing schema consists of three main components:
JSON schema - for defining the event structure and aiding in the creation of the DDL
JSON path - for loading data into Redshift
Redshift DDL - defining the table structure
There’s some more information describing the processes involved in working with self-describing schemas in the example repository here.
Many of the web models rely on having these tables present so you’ll need to make sure they’ve been created - and have some data in them to get the most useful results out.
I had no problems whatsoever with the whole setup until I reached the data modeling which was unclear at the beginning. I smashed my head two days ago, but after a while I finally got ahead with those three components you’re talking about and understood what they do. Also the igluctl and schemaguru are super handy but unfortunately a bit hidden in the documentation.