'Run shell script which is on hdfs location from a pyspark script

I have a requirement where i have to run call a shell script which is on hdfs location and run the shell script from a pyspark script.

My code is something like this,

bashcommand=“hadoop fs -cat (0) |exec sh -s (1)". format (shell script, hqlfile)



subprocess.Popen (bashcommand.split 0),
stdout=subprocess.PIPE)

# hqlfile is my parameter for shellscript

The subprocess.Popen is not working here.

Any help is appreciated Note : (I am running the pyspark script by firing spark-submit)

#####Update

bashCommand='hadoop fs -cat /bin/test/ingest.sh|exec sh -s /bin/test/hql/test.hql'

This is my command which I am trying to execute using

os.system(bashCommand)

The above code I have written in pyspark script and triggering pyspark script through spark-submit

My ingest.sh script contains beeline -u "jdbc:hive2:************" -f $hql_file_path

My beeline command works perfectly fine when I run it on the edgenode and also when i run the shellscript ingest.sh on edgenode directly then also the beeline runs perfectly fine. The issue is only when I trigger it through a spark-Submit

code flow: pyspark--> bashCommand='hadoop fs -cat /bin/test/ingest.sh|exec sh -s /bin/test/hql/test.hql' os.system(bashCommand)

shell script(ingest.sh)--->

beeline -u "jdbc:hive2:************" -f $hql_file_path

Error when triggered the PySpark script:

22/04/23 17:37:26 WARN ipc.Client: Exception encountered while connecting to the server : 

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
sh: line 9: beeline: command not found

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Run shell script which is on hdfs location from a pyspark script

Sources

Related Questions