What is the minimum viable IAM policy for Snowplow operation?

Not that we don’t trust that our Snowplow installation won’t go rogue, but you can never be too careful with your data, right? The IAM setup page gives a rather permissive policy to get things going, but how much can it be restricted? From a very out-of-date setup, we give our snowplow_operator:

{
    "Action": [
        "elasticmapreduce:AddInstanceGroups",
        "elasticmapreduce:AddJobFlowSteps",
        "elasticmapreduce:DescribeJobFlows",
        "elasticmapreduce:ModifyInstanceGroups",
        "elasticmapreduce:RunJobFlow",
        "elasticmapreduce:SetTerminationProtection",
        "elasticmapreduce:TerminateJobFlows",
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics",
        "cloudwatch:PutMetricData"
    ],
    "Resource": [
        "*"
    ],
    "Effect": "Allow"
},
{
    "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CancelSpotInstanceRequests",
        "ec2:CreateSecurityGroup",
        "ec2:CreateTags",
        "ec2:DescribeAvailabilityZones",
        "ec2:DescribeInstances",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSpotInstanceRequests",
        "ec2:DescribeSubnets",
        "ec2:DescribeRouteTables",
        "ec2:ModifyImageAttribute",
        "ec2:ModifyInstanceAttribute",
        "ec2:RequestSpotInstances",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
    ],
    "Resource": [
        "*"
    ],
    "Effect": "Allow",
    "Condition": {
        "StringEquals": {
            "ec2:Region": "us-west-2"
        }
    }
},
{
    "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "sdb:CreateDomain",
        "sdb:Select",
        "sdb:GetAttributes",
        "sdb:PutAttributes",
        "sdb:BatchPutAttributes",
        "sdb:ListDomains",
        "sdb:DomainMetadata"
    ],
    "Effect": "Allow",
    "Resource": [
        "arn:aws:s3:::*elasticmapreduce/*",
        "arn:aws:sdb:*:*:*ElasticMapReduce*/*",
        "arn:aws:sdb:*:*:*"
    ]
}
# and S3 stuff for ETL...

Is this the best we can do? Particularly the *s for EMR, Cloudwatch, and SDB seem large, with the EC2 * being marginally better in that it’s restricted to a particular region. (Note that I’m not even sure that these permissions are sufficient for late-model Snowplows, since we’re so far behind the times.)

Hi @alexc-sigfig - here is our wiki page for the Snowplow operator’s permissions: Setup IAM permissions for operating Snowplow.

Snowplow necessarily requires a lot of AWS permissions to run - it is strongly recommended to setup Snowplow in an exclusive AWS sub-account.

This is maybe more of an AWS question than a Snowplow question, but which resources need to be under the same AWS account? Can I get away with setting up a new collector and emr-etl-runner (and their associated S3 buckets) in a sub-account, but send the data into Redshift owned by another AWS account?

Yes sure - you can keep Redshift in a separate AWS account (many of our customers do).