'Create multiple columns over the same window

The following code is pretty slow.
Is there a way of creating multiple columns at once over the same window, so Spark does not need to partition and order the data multiple times?

w = Window().partitionBy("k").orderBy("t")

df = df.withColumn(F.col("a"), F.last("a",True).over(w))
df = df.withColumn(F.col("b"), F.last("b",True).over(w))
df = df.withColumn(F.col("c"), F.last("c",True).over(w))
...

Solution 1:^[1]

I'm not sure that Spark does partitioning and reordering several times, as you use the same window consecutively. However, .select is usually a better alternative than .withColumn.

df = df.select(
    "*",
    F.last("a", True).over(w).alias("a"),
    F.last("b", True).over(w).alias("b"),
    F.last("c", True).over(w).alias("c"),
)

To find out if partitioning and ordering is done several times, you need to analyse the df.explain() results.

Solution 2:^[2]

You dont have to generate one column at a time. Use list comprehension. Code below

new=['a','b','c']
df = df.select(
    "*", *[F.last(x, True).over(w).alias(f"{x}") for x in new]
    
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ZygD
Solution 2	wwnde

'Create multiple columns over the same window

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]