'How to save a pyspark dataframe into 1000 parts by one of the columns?

I am using pyspark, and I want to save a dataframe divided into 1000 parts by one of the columns. The dataframe I want to save:

df = spark.sql("SELECT * FROM table WHERE first_date='2021-10-31' and second_date >= '2013-01-01' and type in ('sh', 'la')")

I tried to use bucketby:

df.write.bucketBy(1000, 'par').saveAsTable('temp', format='parquet', path='s3://mybucket')

but it didnt work (after a lot of reading I understand why).

I've also tried:

df.repartition(10, 'par').write.format("parquet").bucketBy(10, 'par').saveAsTable("my_table5", path='s3://mybucket/') 

and it didn't work.

Is there a way to save a pyspark dataframe in a fixed number of parts divided by one of the columns in the dataframe?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source