Introduction#

GitHub this note shows managing data access via IAM. Before Lake Formation released in 2019, we need to configure data access via IAM Policy. For example, if there is Data Scientist (IAM user), to enable the DS to query tables in Glue Catalog, we need to configure

  • Grant access to the underlying data in S3 via IAM policy
  • Grant access to tables in Glue Catalog via IAM policy
  • Create an IAM user for a Data Scientist
  • Create an IAM role for Glue interactive session
  • Control access to Athena Workgroup via IAM
athena_serverless_etl

IAM User#

const secret = new aws_secretsmanager.Secret(this, `${props.userName}Secret`, {
secretName: `${props.userName}Secret`,
generateSecretString: {
secretStringTemplate: JSON.stringify({ userName: props.userName }),
generateStringKey: 'password'
}
})
const user = new aws_iam.User(this, `${props.userName}IAMUSER`, {
userName: props.userName,
password: secret.secretValueFromJson('password'),
passwordResetRequired: false
})

Grant Permissions#

Option 1. Grant the DS to access all data

user.addManagedPolicy(
aws_iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonAthenaFullAccess')
)

Option 2. Least priviledge so the DS only can access requested tables. For Glue, please note that

All operations performed on a Data Catalog resource require permission on the resource and all the ancestors of that resource. For example, to create a partition for a table requires permission on the table, database, and catalog where the table is located. The following example shows the permission required to create partitions on table PrivateTable in database PrivateDatabase in the Data Catalog.

Specify the permission to access tables in Glue catalog

new aws_iam.PolicyStatement({
actions: [
"glue:CreateDatabase",
"glue:DeleteDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:UpdateDatabase",
"glue:CreateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:UpdateTable",
"glue:GetTable",
"glue:GetTables",
"glue:BatchCreatePartition",
"glue:CreatePartition",
"glue:DeletePartition",
"glue:BatchDeletePartition",
"glue:UpdatePartition",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
],
effect: Effect.ALLOW,
resources: [
`arn:aws:glue:${this.region}:*:table/${props.databaseName}/*`,
`arn:aws:glue:${this.region}:*:database/${props.databaseName}*`,
`arn:aws:glue:${this.region}:*:*catalog`,
],
}),

The full IAM policy attached to the DS IAM user

const policy = new aws_iam.Policy(
this,
"LeastPriviledgePolicyForDataScientist",
{
policyName: "LeastPriviledgePolicyForDataScientist",
statements: [
// athena
new aws_iam.PolicyStatement({
actions: ["athena:*"],
effect: Effect.ALLOW,
// resources: ["*"],
resources: [
`arn:aws:athena:${this.region}:${this.account}:workgroup/${props.athenaWorkgroupName}`,
],
}),
new aws_iam.PolicyStatement({
actions: [
"athena:ListEngineVersions",
"athena:ListWorkGroups",
"athena:ListDataCatalogs",
"athena:ListDatabases",
"athena:GetDatabase",
"athena:ListTableMetadata",
"athena:GetTableMetadata",
],
effect: Effect.ALLOW,
resources: ["*"],
}),
// access s3
new aws_iam.PolicyStatement({
actions: [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:CreateBucket",
"s3:PutObject",
"s3:PutBucketPublicAccessBlock",
],
effect: Effect.ALLOW,
resources: [
props.athenaResultBucketArn,
`${props.athenaResultBucketArn}/*`,
props.sourceBucketArn,
`${props.sourceBucketArn}/*`,
],
}),
// access glue catalog
new aws_iam.PolicyStatement({
actions: [
"glue:CreateDatabase",
"glue:DeleteDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:UpdateDatabase",
"glue:CreateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:UpdateTable",
"glue:GetTable",
"glue:GetTables",
"glue:BatchCreatePartition",
"glue:CreatePartition",
"glue:DeletePartition",
"glue:BatchDeletePartition",
"glue:UpdatePartition",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
],
effect: Effect.ALLOW,
resources: [
`arn:aws:glue:${this.region}:*:table/${props.databaseName}/*`,
`arn:aws:glue:${this.region}:*:database/${props.databaseName}*`,
`arn:aws:glue:${this.region}:*:*catalog`,
],
}),
// access lakeformation
// new aws_iam.PolicyStatement({
// actions: ["lakeformation:GetDataAccess"],
// effect: Effect.ALLOW,
// resources: ["*"],
// }),
],
}

Glue Role#

  • Create an IAM Role for Glue
  • Create a Glue Notebook or interactive session
  • Read data from Glue catalog, S3
  • GlueServiceRoleNotebook name convention and iam:PassRole
const role = new aws_iam.Role(this, `GlueRoleFor-${props.pipelineName}`, {
roleName: `GlueRoleFor-${props.pipelineName}`,
assumedBy: new aws_iam.ServicePrincipal('glue.amazonaws.com')
})
role.addManagedPolicy(
aws_iam.ManagedPolicy.fromAwsManagedPolicyName(
'service-role/AWSGlueServiceRole'
)
)
role.addManagedPolicy(
aws_iam.ManagedPolicy.fromAwsManagedPolicyName('CloudWatchAgentServerPolicy')
)

Then attach the same policy bove LeastPriviledgePolicyForDataScientist to the Gule role. The notebook need to pass role to the execution session, so there are to options

  • Explicit sepcify iam:PassRole in the policy below
  • Follow the role name convetion such as AWSGlueServiceRoleNotebook
For Role name, enter a name for your role; for example, AWSGlueServiceRoleDefault. Create the role with the name prefixed with the string AWSGlueServiceRole to allow the role to be passed from console users to the service. AWS Glue provided policies expect IAM service roles to begin with AWSGlueServiceRole. Otherwise, you must add a policy to allow your users the iam:PassRole permission for IAM roles to match your naming convention. Choose Create Role.
const policy = new aws_iam.Policy(
this,
'LeastPriviledgePolicyForGlueNotebookRole',
{
policyName: 'LeastPriviledgePolicyForGlueNotebookRole',
statements: [
// pass iam role
new aws_iam.PolicyStatement({
actions: ['iam:PassRole', 'iam:GetRole'],
effect: Effect.ALLOW,
resources: ['*']
}),
// athena
new aws_iam.PolicyStatement({
actions: ['athena:*'],
effect: Effect.ALLOW,
resources: ['*']
}),
// access s3
new aws_iam.PolicyStatement({
actions: ['s3:*'],
effect: Effect.ALLOW,
resources: [
props.athenaResultBucketArn,
`${props.athenaResultBucketArn}/*`,
props.sourceBucketArn,
`${props.sourceBucketArn}/*`
]
}),
// access glue catalog
new aws_iam.PolicyStatement({
actions: ['glue:*'],
effect: Effect.ALLOW,
resources: [
`arn:aws:glue:${this.region}:*:table/${props.databaseName}/*`,
`arn:aws:glue:${this.region}:*:database/${props.databaseName}*`,
`arn:aws:glue:${this.region}:*:*catalog`
]
})
]
}
)

then attach the policy to the Glue role

policy.attachToUser(user)

Reference#