'Error: java.lang.RuntimeException: Reducer task failed to copy 3 files: hdfs://
I have a spark job which saves data in temp directory and copies data from temp to s3:
df
.coalesce(10)
.write
.mode(SaveMode.Overwrite)
.partitionBy(date)
.parquet(tmpDirParquet)
s3-dist-cp --src tmpDirParquet --dest s3://foo
This works fine on an EMR cluster with a concurrency of 1. However I noticed that when I increase concurrency to more than 1 the steps are completing successfully but I get the following exception:
Error: java.lang.RuntimeException: Reducer task failed to copy 3 files: hdfs://host:port/tmp/parquet/date=2022-01-04/part-xxxx.snappy.parquet etc
at com.amazon.elasticmapreduce.s3distcp.CopyFilesReducer.cleanup(CopyFilesReducer.java:64)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:17
Any ideas what might be happening here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
