'PySpark distinct funtion

In a spark dataframe there are 10 columns, one of which is 'department'. Under the department column there are 10 distinct entries which I calculated with the distinct() function. I need to calculate the number of rows for each unique department entry now. What function should I use?

pyspark

Solution 1:^[1]

You can calculate the number of rows by groupBy() function.

df.groupBy('department').count()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Kyungjin Jung

'PySpark distinct funtion

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]