Hi, we have been operating the snowplow infrastructure launched using the Terraform Quickstart AWS Secure Setup. Our plan is to eliminate other loaders, including the Postgres loader and Iglu, and solely rely on the collected data in the S3 raw bucket. This will result in a more cost-efficient and manageable infrastructure. Is there a guide available for this type of architecture? Additionally, how can we start analyzing the collected data stored in the raw bucket in .gz format?
Do you mean the raw data (collector payloads) or the data that comes from the enriched process for clarification? The “raw” data that comes from the collector isn’t really in an analysable format whereas the enriched data certainly can be used (e.g., Kinesis => S3 loader => S3 => Glue/Athena.
Hi Mike, Yes, this is exactly what I am trying to do (e.g., Kinesis => S3 loader => S3 => Glue/Athena.). I am having challenges looking for documentations for this set up. Do we have some references on how this can be achieved? I am new to data engineering, thanks for understanding.