'PySpark distinct funtion

In a spark dataframe there are 10 columns, one of which is 'department'. Under the department column there are 10 distinct entries which I calculated with the distinct() function. I need to calculate the number of rows for each unique department entry now. What function should I use?



Solution 1:[1]

You can calculate the number of rows by groupBy() function.

df.groupBy('department').count()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kyungjin Jung