I setup a snowplow pipeline using the terraform script and the tutorial in the documentation from here.
The entire setup was a breeze and very well documented (thank you!). The issue I ran into was atomic events table was not automatically created in the postgres.
- I can see the test events being delivered to my S3 bucket, but they are encoded so I can’t truly verify.
- I see postgres instance has a db created but no tables in it. How do I fix this?
- What encoding are the events in? If I wanted to create an Athena table, how would I go about it?
The Postgres Loader should automatically create the atomic events table. Assuming it has initialised as expected then the atomic.events table should be available in the RDS Postgres that spun up as part of following the Quick Start guide. It might be worth looking for any logs the application has generated if you’re not seeing the table within Postgres.
If you’re looking in the
/enriched bucket then these events are Tab Separated Values. This blog post might help you with Athena and Snowplow S3 files, and if you want to inspect
/bad then there are some table definitions for these here.
Hi Paul. Thanks for your reply. I only see raw events in the S3. Both enriched and bad don’t show up in S3.
The only logs I saw generated were by the terraform script. Where can I check the logs on where did it fail?
Interesting, that sounds like both
postgres-loader are not running (and perhaps the
s3-loader for enriched and bad rows). Did you set up and configure the Iglu Server first?
It might be worth looking at what your
terraform plan states too, see if everything is spun up that you’d expect. Or perhaps log in to your AWS account and see how many
ec2 instances are running, I think you should have 7 with the default deployment: iglu server, collector, enrich, s3 loader raw, s3 loader enriched, s3 loader bad, postgres loader.
I see 8 of them. I think the iglu server is set up right too. I see the right resolver in the dynamodb entry. And when I run this curl command, I get back the schema.
curl https://MY_IGLU_HOST_NAME/api/schemas/com.snowplowanalytics.snowplow/site_search/jsonschema/1-0-0 -X GET -H "apikey: MY_API_KEY"
I also ran terraform plan and everything is deployed with nothing pending. The payload is reaching the raw kinesis stream. Where can I check logs to further debug what happens at that point?
So every component is deployed as a docker container! That’s sweet. Any way to configure the logs driver to send it to aws cloudwatch?
The issue is resolved now. The problem was an extra space at the end of the iglu server URL in the tfvars file for the pipeline. Thank you for your help in debugging this. Appreciate it very much.
Hey @yathaarth Glad you figured it out!
We’re not currently logging to Cloudwatch but it’s definitely something we can add in a future release. You’re not the first person to ask either, so good idea!
The modules now have cloudwatch logging: Snowplow Terraform Modules updated with Cloudwatch logging
Thanks for your suggestion!