'Does Spark allow to use Amazon Assumed Role and STS temporary credentials for Glue cross account access on EMR

We are trying to connect to the cross-account AWS Glue catalog with the EMR spark job. I did a study that AWS supports cross-account access for the Glue catalog in two ways.

  1. IAM role-based. (This is not working for me)
  2. Resource-based policy. (This worked for me)

So the problem scenario is, Account A creates EMR with its role role_Account_A. And role role_Account_A wants to access the glue catalog of Account B.

  • Account A creates EMR cluster with role role_Account_A
  • Account B has role role_Account_B which has access to glue and s3 with role_Account_A in trusted entities.
  • role_Account_A has sts:AssumeRole policy for resource role_Account_B
  • using sdk we are able to assume role role_Account_B from role_Account_A and getting temporary credentials.
  • EMR has configurations[{"classification":"spark-hive-site","properties":{"hive.metastore.glue.catalogid":"Account_B", "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]
    SparkSession sparkSession=SparkSession.builder().appName("testing glue")
                .enableHiveSupport()
                .getOrCreate();
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", assumedcreds.getAccessKeyId());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sparkContext().conf().set("fs.s3a.access.key",  assumedcreds.getAccessKeyId());
sparkSession.sparkContext().conf().set("fs.s3a.secret.key",  assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().conf().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sql("show databases").show(10, false);

The error that we are getting is

    Caused by: MetaException(message:User: arn:aws:sts::Account_A:assumed-role/role_Account_A/i-xxxxxxxxxxxx is not authorized to perform: glue:GetDatabase on resource: arn:aws:glue:XX-XXXX-X:Account_B:catalog 
because no resource-based policy allows the glue:GetDatabase action (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: X93Xbc64-0153-XXXX-XXX-XXXXXXX))

Questions:-

  • Does spark supports glue-based authentication properties for example aws.glue.access.key?
  • As per error spark is not using assumed role role_Account_B. It uses role_Account_A with which EMR was created. Can we make it use assumed role role_Account_B?

I will update the question details if I am missing something.



Solution 1:[1]

I believe you're having an EMR instance profile role in Account A. If so, you would have to follow these and cross-account access should work

In Account B,

  1. Under Glue, go to settings and add the ( EMR instance profile role A ) as principal and provide access to Account B's glue and S3. It is recommended to provide only for the buckets you need to access
  2. Go to the bucket policy of the bucket that the glue table will be using and add the ( EMR instance profile role A ) as principal and provide read/write access.

Now if you run the EMR job in account A, you'll see the job running with cross-account access

It works for our purpose. Try it out

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sakthi kavin