'How to add environment variables to an EMR cluster

How to add environment variables to an EMR cluster.

Currently, I have added them in a .sh file and was using script-runner.jar to run the script.

#!/bin/bash
export PYSPARK_PYTHON=/home/hadoop/bin/python
export PYSPARK_DRIVER_PYTHON=/home/hadoop/bin/python

Like this I was submitting the script as mentioned here:

aws emr add-steps \
--cluster-id j-2AXXXXXXGAPLF \
--steps Type=CUSTOM_JAR,Name="Run a script from S3 with script-runner.jar",ActionOnFailure=CONTINUE,Jar=s3://us-west-2.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/my-script.sh]

I have also tried using command-runner.jar. Both the approaches did not work. Can you suggest some other approach to add env variables to the cluster remotely/from an EC2 instance?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source