'PySpark distinct funtion
In a spark dataframe there are 10 columns, one of which is 'department'. Under the department column there are 10 distinct entries which I calculated with the distinct() function. I need to calculate the number of rows for each unique department entry now. What function should I use?
Solution 1:[1]
You can calculate the number of rows by groupBy() function.
df.groupBy('department').count()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kyungjin Jung |
