'PySpark Group the Dataframe by Month
I have a column of date and a column of count. eg:
Date Count:
3/07/2010 1
2/01/2010 2
1/07/2012 5
I used the code below to change to the data type to date:
func = udf (lambda x: datetime.strptime(x, '%d/%m/%Y'), DateType())
crime_mongodb_df = crime_mongodb_df.withColumn('Reported Date', func(col('Reported Date')))
Then, I want to group the data by year and find the total count per year. I am not sure how to do the grouping. Can I get some help? Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
