Aws quickstart optimized snowplow infra

snow1 · January 30, 2023, 6:00am

Hi, we have been operating the snowplow infrastructure launched using the Terraform Quickstart AWS Secure Setup. Our plan is to eliminate other loaders, including the Postgres loader and Iglu, and solely rely on the collected data in the S3 raw bucket. This will result in a more cost-efficient and manageable infrastructure. Is there a guide available for this type of architecture? Additionally, how can we start analyzing the collected data stored in the raw bucket in .gz format?

Thanks!

mike · January 30, 2023, 9:21am

Do you mean the raw data (collector payloads) or the data that comes from the enriched process for clarification? The “raw” data that comes from the collector isn’t really in an analysable format whereas the enriched data certainly can be used (e.g., Kinesis => S3 loader => S3 => Glue/Athena.

snow1 · January 30, 2023, 11:33am

Hi Mike, Yes, this is exactly what I am trying to do (e.g., Kinesis => S3 loader => S3 => Glue/Athena.). I am having challenges looking for documentations for this set up. Do we have some references on how this can be achieved? I am new to data engineering, thanks for understanding.

PaulBoocock · January 30, 2023, 11:47am

You’re going to want to go:

Snowplow Collector → Raw Kinesis → Snowplow Enrich → Enriched Kinesis → S3 Loader → Enriched S3

You should be able to remove parts of the Quick start to remove any loaders (Postgres/Snowflake).

Then you can use Glue/Athena to query this. Some links that might be useful.

Some of the manual deployment docs might help with your broader understanding: Manual Setup on AWS | Snowplow Documentation

Topic		Replies	Views
Data in S3 in JSON format (quickstart-examples) For engineers	5	1518	April 26, 2022
Approaches to access data in S3 For data modelers & consumers	2	1621	May 18, 2021
Collector -> S3 loader Collectors	3	1477	June 7, 2020
No bad data in S3 Enrichment	8	825	October 16, 2023
Export data to S3 Storage targets	2	1065	March 15, 2022

Aws quickstart optimized snowplow infra

Related topics