Storage loader fails with S3 API errors when using IAM credentials

cnamejj · July 19, 2016, 7:56am

I’d prefer to use IAM based credentials if at all possible for all stages of the Snowplow pipeline, to avoid having configuration file with hardcoded, plaintext key/secret settings.

I’ve been trying to run the storage loader with the key and secret specified as “iam”, after looking through the source code to verify that’s the most likely way to signal that to the code. But after extensive debugging, including an AWS ticket to review the S3 logs from the errors, I’m starting to think that’s not supported.

While I see S3 API’s being sent, including the correct IAM/instance access key and some sort of secret, the API calls fail with the following error and the program exits.

— cut here —

<?xml version=\"1.0\" encoding=\"UTF-8\"?>\nInvalidAccessKeyIdThe AWS Access Key Id you provided does not exist in our records.ASIAJWUD6PFHBCXXXXXXetc...

— cut here —

Is there a way to get the storage loader to leverage IAM roles assigned to the instance the code is running on? Or do I have to revert to putting key/secret into for an IAM user into the config file?

alex · July 19, 2016, 9:18am

Hi @cnamejj - the S3 API errors you mention are probably related to the archive step post-load. This step doesn’t support IAM roles because it uses a library, Sluice, which doesn’t support IAM roles: https://github.com/snowplow/sluice/issues/31

We don’t have a timeline on adding IAM to Sluice because we instead plan on moving all S3 file operations to S3DistCp, which can leverage the IAM credentials on the EMR cluster itself.

In the meantime, if you want you can disable the archive step (--skip archive_enriched) and replace it in your job DAG with a few lines of Boto, which can of course leverage IAM roles.

cnamejj · July 19, 2016, 5:53pm

Thanks, trying that out next.

cnamejj · July 19, 2016, 6:01pm

I re-ran skipping everything except “load” and still got the same error. So I’m going to assume that step also uses Sluice and just leave the key/secret in place for now. I have other issues to resolve and need to get the software working ASAP. I might revisit later, but right now I need to get things running to unblock developers.

— cut here —
$ snowplow-storage-loader --config ./redshift.conf --skip archive_enriched,analyze,shred,delete
Loading Snowplow events and shredded types into My Redshift database (Redshift cluster)…
Unexpected error: Expected(200) <=> Actual(403 Forbidden)
excon.error.response
:body => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\nInvalidAccessKeyIdThe AWS Access Key Id you provided does not exist in our records.ASIAJNBLPANIURxxxxxxetc…

alex · July 20, 2016, 7:55am

Ah - yes sorry, the StorageLoader also uses Sluice to determine what shredded JSONs in S3 need loading into Redshift.

Best to wait on our rewrite of StorageLoader into Scala to resolve all this.

Topic		Replies	Views
Data loading error to redshift Storage targets	7	8767	November 1, 2019
Using IAM roles for Authentication with dataflow-runner not working	9	1535	June 9, 2023
Storage Loader successful but not loading Redshift or Postgres DB Storage targets	4	2031	March 28, 2017
EMR intermittently fails at Loading S3 to Redshift AWS batch pipeline (Legacy)	11	4576	March 12, 2020
Snowflake Loader Setup: SSL error For engineers	4	2149	March 26, 2020

Storage loader fails with S3 API errors when using IAM credentials

Related topics