'How to print arguments that i sent to a Airflow-EMR cluster?
I am executing EMR (spark-submit) through airflow 2.0 and I am submitting steps as follows:
My s3://dbook/ buckets all files needed for spark-submit, first I am copying all files to EMr(Copy S3 to EMR) and then executing the spark-submit command, but I am getting an error called
"no module named config". I need to know what args is being sent to EMR clsuter. How to achieve this?
SPARK_STEPS = [
{
'Name': 'Copy S3 to EMR',
"ActionOnFailure": "CANCEL_AND_WAIT",
'HadoopJarStep': {
"Jar": "command-runner.jar",
"Args": ['aws' ,'s3', 'cp' ,'s3://dbook/', '.', '--recursive'],
},
},
{
'Name': 'Spark-Submit Command',
"ActionOnFailure": "CANCEL_AND_WAIT",
'HadoopJarStep': {
"Jar": "command-runner.jar",
"Args": [
'spark-submit',
'--py-files',
'config.zip,jobs.zip',
'main.py'],
},
}
]
Thanks, Xi
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
