'Pyspark dataframe

In pySpark I would change the number of partitions when I load the data.

df_sp = spark.read\
             .format('csv')\
             .option("header", "true")\
             .option("mode", "FAILFAST")\
             .option("inferSchema", "true")\
             .option("sep", ",")\
             .load(os.path.join(dirPath, nameFile))

Using pySpark, it possible to tune the number of partition at loading time?



Solution 1:[1]

Yes, change spark.sql.files.maxPartitionBytes. It's 134217728 (128 MB) by default.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pltc