'Pyspark Column name alias when applying Aggregate using a Dictionary
I am applying an aggregate function on a data frame in pyspark. I am using a dictionary to pass the column name and aggregate function
df.groupBy(column_name).agg({"column_name":"sum"})
I now want to apply an alias to this column that has been generated using the aggregate method. Is there a way to do it?
The reason I am using the dictionary method is that aggregates will be applied dynamically depending on input parameters.
So basically it will be like
def aggregate(df, column_to_group_by, columns_to_aggregate):
df.groupBy(column_to_group_by).agg(columns_to_aggregate)
Where columns_to_aggregate will look like
{
"salary":"sum"
}
I now want to apply alias to the newly created column, because If I try to save the result to disk as praquet I get the error
Column name "sum(salary)" contains invalid character(s). Please use alias to rename it.
Any help on how to apply alias dynamically will be great
Thanks !
Solution 1:[1]
from pyspark.sql.functions import sum
df.groupBy("state") \
.agg(sum("salary").alias("sum_salary"))
Please read the article
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Arkon88 |
