'How to aggregate some dates and data which belong to into one row in pyspark?
I want to aggregate some dates (for example one month for each customer) and its data to one row in pyspark.
Example simply as the bellow table
| Customer_Id | Date | Data |
|---|---|---|
| id1 | 2021-01-01 | 2 |
| id1 | 2021-01-02 | 3 |
| id1 | 2021-01-03 | 4 |
I want to change it into
| Customer_Id | Date | col1 | col2 | col3 |
|---|---|---|---|---|
| id1 | [2021-01-01 - 2021-01-03] | 2 | 3 | 4 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
