Category "s3distcp"

S3DistCP - Split source in multiples jobs

I have to do copy of an S3 to HDFS of an cluster EMR. I'm trying to smaller the execution time of my job. Looking in the logs the map input of the job is 1_000_