'how to pass parameter to dictionary input for agg pyspark function

From the pyspark docs, I Can do:

gdf = df.groupBy(df.name)
sorted(gdf.agg({"*": "first"}).collect())

In my actual use case I have maaaany variables, so I like that I can simply create a dictionary, which is why:

gdf = df.groupBy(df.name)
sorted(gdf.agg(F.first(col, ignorenulls=True)).collect())

@lemon's suggestion won't work for me.

How can I pass a parameter for first (i.e. ignorenulls=True), see here.



Solution 1:[1]

You can use list comprehension.

gdf.agg(*[F.first(x, ignorenulls=True).alias(x) for x in df.columns]).collect()

Solution 2:[2]

Try calling the pyspark function directly:

import pyspark.sql.functions as F

gdf = df.groupBy(df.name)

parameters = {'col': <your_column_name, 'ignorenulls': True}
sorted(gdf.agg(F.first(**parameters)).collect())

Does it work for you?

ps. ignorenulls' is True by default.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Emma
Solution 2