Sync RDB loader to Redshift on another account

We have our snowplow instance installed on a subaccount in AWS, but are hoping to push the events to a redshift cluster on the parent account. I have attempted to supply the appropriate credentials in the config.hocon for the loader, and am seeing mixed results. The manifest table seemed to be successfully created in redshift when I ran the RDB loader the first time, so some perms seem to work, but it is erroring out trying to sync S3. It will read in the directories correctly, but then say access is denied when it goes to sync. My current config.hocon supplies the role that is associated with the cluster, which has permissions to assume a role on the child account that has the S3 read only permission.
Is this kind of communication possible? If so am I approaching it right?
I thought about installing the loader on the parent account, but then didn’t know how to connect to the sqs feed on the child account, as the only setting I can see for sqs is the feed name.

Sounds like the right approach to me. My gut is telling me that the bucket’s policy might have something to do with it. The user might have access but a bucket’s ACL might be blocking. (At least, it would be ACL based for cross-account access, I’m not sure if a sub-account is the same approach)

Edit to add: A quick litmus test of this is to see if the user (or a user with equivalent permissions) can aws s3 cp a file from the bucket

1 Like

Thanks for the reply. I updated the ACL to add the parent account access to the child, but am still getting the same error. I’ll try debugging those permissions like you suggested to learn more.
Where the loader is running on the child account, is it wrong of me to supply the role of the parent account in the config.hocon? When I tried supplying one from the child account, I got an error saying that the role needs to be associated to be on the same account as the cluster, and that error went away when I updated to the role on the parent. Just seems strange to be supplying it a role on a different account. I’m wondering if I need to reverse the direction and supply a role that is on the child account that is assuming the role of the parent account which is associated with the cluster, and then tie the s3 perms in there somewhere as well.

Ah ok I see what you mean - apologies I need to retract my previous confidence, I’m actually not sure which approach it should be.

I should mention that this isn’t my specialist area, I’ve popped message into the loaders team to see if someone more qualified can add their 2c on Monday.

It might still be helpful to outline what my gut is telling me - the awkward thing about cross-account s3 permissions is that if the data originates from account 1, even when it’s in account 2, the encryption key still belongs to account 1. So for a role in account 2 to be able to copy it within account 2, it needs permissions both for the objects in account 1 and account 2.

Now, I’m unclear as to how helpful that is, it might be a red herring since it seems that your buckets are all in the same account… If the above explanation sounds off then I wouldn’t pay it too much heed.

Sorry I can’t be more useful for now, like I mentioned I’ve asked for input from someone smarter than me :slight_smile:

1 Like

Thanks for your input! I’m grateful for whatever help I can get. And yes to clarify, all of our snowplow infrastructure is isolated to a sub-account, and the redshift database is on the parent account.

Hey @Ben_Harker,

Cross-account loading is explained here much better than I can explain: COPY or UNLOAD data from another account in Amazon Redshift

Could you give it a try in case you haven’t before? As it is explained there, what we are trying to achieve is following basically:

  • Create RoleA, an IAM role in the Amazon S3 account.
  • Create RoleB, an IAM role in the Amazon Redshift account with permissions to assume RoleA.

I understand your confusion about giving a role from another account to the loader but this is how it should be done indeed in this case because loader is not using this IAM load role directly itself. Instead, it runs some copy query with specified IAM load role on Redshift cluster. IAM role is used by Redshift cluster itself during copy operation and therefore we need to give IAM role which is in the same account with the cluster itself.

2 Likes

Thanks for verifying that setup. That is the role sharing that I have set up currently, but am getting the access denied error in the screenshot above. I’ll spend some more time debugging that access, but if there are any other permission nuances you can think of that would be helpful.

One more question @enes_aldemir it looks like in the documentation you sent the roles are supplied like so in the commands: arn:aws:iam::Amazon_Redshift_Account_ID:role/RoleB,arn:aws:iam::Amazon_S3_Account_ID:role/RoleA
Should I be supplying the role in a similar fashion in the config?

Update: supplying the role as mentioned above seems to have resolved the error I was getting. Thanks @enes_aldemir and @Colm for pointing me in the right direction!

4 Likes

Glad to hear it! AWS permissions can be terribly awkward to figure out. :slight_smile: