'Does Spark allow to use Amazon Assumed Role and STS temporary credentials for Glue cross account access on EMR
We are trying to connect to the cross-account AWS Glue catalog with the EMR spark job. I did a study that AWS supports cross-account access for the Glue catalog in two ways.
- IAM role-based. (This is not working for me)
- Resource-based policy. (This worked for me)
So the problem scenario is, Account A creates EMR with its role role_Account_A. And role role_Account_A wants to access the glue catalog of Account B.
- Account A creates EMR cluster with role role_Account_A
- Account B has role role_Account_B which has access to glue and s3 with role_Account_A in trusted entities.
- role_Account_A has sts:AssumeRole policy for resource role_Account_B
- using sdk we are able to assume role role_Account_B from role_Account_A and getting temporary credentials.
- EMR has configurations[{"classification":"spark-hive-site","properties":{"hive.metastore.glue.catalogid":"Account_B", "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]
SparkSession sparkSession=SparkSession.builder().appName("testing glue")
.enableHiveSupport()
.getOrCreate();
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", assumedcreds.getAccessKeyId());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sparkContext().conf().set("fs.s3a.access.key", assumedcreds.getAccessKeyId());
sparkSession.sparkContext().conf().set("fs.s3a.secret.key", assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().conf().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sql("show databases").show(10, false);
The error that we are getting is
Caused by: MetaException(message:User: arn:aws:sts::Account_A:assumed-role/role_Account_A/i-xxxxxxxxxxxx is not authorized to perform: glue:GetDatabase on resource: arn:aws:glue:XX-XXXX-X:Account_B:catalog
because no resource-based policy allows the glue:GetDatabase action (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: X93Xbc64-0153-XXXX-XXX-XXXXXXX))
Questions:-
- Does spark supports glue-based authentication properties for example aws.glue.access.key?
- As per error spark is not using assumed role role_Account_B. It uses role_Account_A with which EMR was created. Can we make it use assumed role role_Account_B?
I will update the question details if I am missing something.
Solution 1:[1]
I believe you're having an EMR instance profile role in Account A. If so, you would have to follow these and cross-account access should work
In Account B,
- Under Glue, go to settings and add the ( EMR instance profile role A ) as principal and provide access to Account B's glue and S3. It is recommended to provide only for the buckets you need to access
- Go to the bucket policy of the bucket that the glue table will be using and add the ( EMR instance profile role A ) as principal and provide read/write access.
Now if you run the EMR job in account A, you'll see the job running with cross-account access
It works for our purpose. Try it out
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sakthi kavin |
