'Pyspark Column name alias when applying Aggregate using a Dictionary

I am applying an aggregate function on a data frame in pyspark. I am using a dictionary to pass the column name and aggregate function

df.groupBy(column_name).agg({"column_name":"sum"})

I now want to apply an alias to this column that has been generated using the aggregate method. Is there a way to do it?

The reason I am using the dictionary method is that aggregates will be applied dynamically depending on input parameters.

So basically it will be like

def aggregate(df, column_to_group_by, columns_to_aggregate):
     df.groupBy(column_to_group_by).agg(columns_to_aggregate)

Where columns_to_aggregate will look like

{
   "salary":"sum"
}

I now want to apply alias to the newly created column, because If I try to save the result to disk as praquet I get the error

Column name "sum(salary)" contains invalid character(s). Please use alias to rename it.

Any help on how to apply alias dynamically will be great

Thanks !



Solution 1:[1]

from pyspark.sql.functions import sum
df.groupBy("state") \
  .agg(sum("salary").alias("sum_salary"))

Please read the article

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arkon88