Understanding S3 copy to snowflake

Hi,

We are trying to get a snowplow pipeline working using the quickstart “secure” terraform for aws. The target database is snowflake. We have applied the terraform for iglu server, pipeline and snowflake, and it all seems to have worked okay.

However when we run the test curl command, we get data in the S3 buckets, but not snowflake.

There are no errors in cloudwatch, just INFO messages. There are also no “COPY events FROM s3…” messages in cloudwatch, which I think is suspicious.

I think to diagnose this issue it would be helpful if I could understand which part of the quickstart terraform sets up the integratuion between S3 and snowflake.

Reading this in the guide: “… For this purpose, the Snowflake Terraform module has been created. This module creates resources including, but not limited to, Snowflake database, table, user, and role. These resources are needed by the Snowflake Loader to operate correctly.”

Does this also ceate the resources which copy data from S3 to snowflake?

Which resources can I look for in the state and plan to see which might be missing or misconfigured?

Any clues to point us in the right direction would be very helpfule :smiley:

Thanks!

Hi @chris it sounds like something in the chain has broken here. It might help to decompose this example I put together for the recent Snowplow at Scale webinar: webinar-open-source-at-scale/main.tf at main · snowplow-devops/webinar-open-source-at-scale · GitHub

Ultimately to get data into Snowflake you need to Collect, Enrich, Transform and then Load it into the prepared database + schema in Snowflake.

The quickstart is a little out of date with our latest modules at this point (and is on our TODO list for a refresh!) so the above is your best bet at something that works easily.

Thanks Josh,

I found the issue. I had missed one of the variables, so the code in main.tf:

  snowflake_enabled = (
    var.pipeline_db == "snowflake"
      && var.snowflake_account != ""
      && var.snowflake_region != ""
      && var.snowflake_loader_user != ""
      && var.snowflake_loader_password != ""
      && var.snowflake_database != ""
      && var.snowflake_schema != ""
      && var.snowflake_loader_role != ""
      && var.snowflake_warehouse != ""
      && var.snowflake_transformed_stage_name != ""
  )

… did not set snowflake_enabled, leading to the snowflake part of the pipeline being skipped. Someone more experienced would probably have noticed a number of resources missing!

Hope this helps someone else.
Chris.