'scala - to avoid creating empty avro file (or handling the number of files)
my_data.write
.mode(SaveMode.Overwrite)
.avro(_outputPath)
It works fine usually, but when the data is a very small amount, there are some empty Avro files.
All the number of files are quite different per try, when the data row is less than the number of files, some file is in an empty state, only column info are included.
Is there a way to handle the number of output Avro files per the data row number? Or not to create output file if there's not data?
Solution 1:[1]
The number of files will depend on how many partitions your dataframe has. Each partition will create its own file. If you know that there is no much data to write, you can re-partition the dataframe before writing it.
my_data.repartition(1)
.write
.mode(SaveMode.Overwrite)
.avro(_outputPath)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ivan Stanislavciuc |
