'Unable write data using spark submit

when I'm doing spark-submit using this command on Cloudera

**time spark-submit \
--deploy-mode client \
--conf spark.app.name='XXXxxxxxx' 
--conf spark.master=local[*] \
--conf spark.driver.memory=20g \
--conf spark.driver.cores=2 \
--conf spark.executor.instances=4 \
--conf spark.executor.memory=20g \
--conf spark.executor.cores=7 \
--conf spark.dynamicAllocation.enabled=True \
--conf spark.dynamicAllocation.minExecutors=0 \
--conf spark.dynamicAllocation.maxExecutors=5 \
--conf spark.dynamicAllocation.executorAllocationRatio=0.5 \
--conf spark.local.dir=/data/xxx/ssss/spark_local/ \
--py-files test.py \
--files test_ed.ini \
test_py.py**

I am getting this in my output log file:

**Caused by: java.io.FileNotFoundException: /xxx/xxxxx/xxx/xxx/spark_local (Too many open files)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:105)
    at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:118)
    at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more**

I have tried the shuffle partition after this job gets successful, but my downstream job also fails after that.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source