'Pyspark dataframe
In pySpark I would change the number of partitions when I load the data.
df_sp = spark.read\
.format('csv')\
.option("header", "true")\
.option("mode", "FAILFAST")\
.option("inferSchema", "true")\
.option("sep", ",")\
.load(os.path.join(dirPath, nameFile))
Using pySpark, it possible to tune the number of partition at loading time?
Solution 1:[1]
Yes, change spark.sql.files.maxPartitionBytes. It's 134217728 (128 MB) by default.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pltc |
