'PySpark Group the Dataframe by Month

I have a column of date and a column of count. eg:

Date       Count: 
3/07/2010  1
2/01/2010  2
1/07/2012  5

I used the code below to change to the data type to date:

func =  udf (lambda x: datetime.strptime(x, '%d/%m/%Y'), DateType())
crime_mongodb_df = crime_mongodb_df.withColumn('Reported Date', func(col('Reported Date')))

Then, I want to group the data by year and find the total count per year. I am not sure how to do the grouping. Can I get some help? Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source