Not that we don’t trust that our Snowplow installation won’t go rogue, but you can never be too careful with your data, right? The IAM setup page gives a rather permissive policy to get things going, but how much can it be restricted? From a very out-of-date setup, we give our snowplow_operator:
{
"Action": [
"elasticmapreduce:AddInstanceGroups",
"elasticmapreduce:AddJobFlowSteps",
"elasticmapreduce:DescribeJobFlows",
"elasticmapreduce:ModifyInstanceGroups",
"elasticmapreduce:RunJobFlow",
"elasticmapreduce:SetTerminationProtection",
"elasticmapreduce:TerminateJobFlows",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics",
"cloudwatch:PutMetricData"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateSecurityGroup",
"ec2:CreateTags",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeKeyPairs",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSubnets",
"ec2:DescribeRouteTables",
"ec2:ModifyImageAttribute",
"ec2:ModifyInstanceAttribute",
"ec2:RequestSpotInstances",
"ec2:RunInstances",
"ec2:TerminateInstances"
],
"Resource": [
"*"
],
"Effect": "Allow",
"Condition": {
"StringEquals": {
"ec2:Region": "us-west-2"
}
}
},
{
"Action": [
"s3:GetObject",
"s3:ListBucket",
"sdb:CreateDomain",
"sdb:Select",
"sdb:GetAttributes",
"sdb:PutAttributes",
"sdb:BatchPutAttributes",
"sdb:ListDomains",
"sdb:DomainMetadata"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::*elasticmapreduce/*",
"arn:aws:sdb:*:*:*ElasticMapReduce*/*",
"arn:aws:sdb:*:*:*"
]
}
# and S3 stuff for ETL...
Is this the best we can do? Particularly the *s for EMR, Cloudwatch, and SDB seem large, with the EC2 * being marginally better in that it’s restricted to a particular region. (Note that I’m not even sure that these permissions are sufficient for late-model Snowplows, since we’re so far behind the times.)