I have to do copy of an S3 to HDFS of an cluster EMR. I'm trying to smaller the execution time of my job. Looking in the logs the map input of the job is 1_000_
spreadsheet
virus
json-deserialization
kafka-streams-binder
createtextnode
iphone-x
stryker-net
adgroup
clarion
threadgroup
java-service-wrapper
logstash-logback-encoder
uiimage
backups
asn.1
forever-monitor
react-on-rails
dunst
securid
tuner
base58
nsslider
asterisk-ari
code-signing-certificate
color-thief
epf
fullcalendar-scheduler
concat
having
cub