'How do I create an array from a grouping of row_number()?

I have code that uses row_number() partitioned by date. I would like to create an array that contains data grouped by the row_number that is partitioned by date.

example code is something like this:

w=Window.partitionBy('part_id', 'part_date').orderBy(col('timestamp').desc())
df2 =df.withColumn('row_num', row_number().over(w))

The above code works for creating the partition. I am not sure how to create the array grouped by date that yields the part_num.

I thought maybe something like this. (this code does not work, just an example)

.withColumn('array_prt_num' , array('part_num')).groupBy('row_num')

thoughts?

Image link to df output wish



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source