'Inserting intermediate values between min value and max value without using loops
I have a very large dataset that requires me to use PySpark to process it. I have a table that looks like this. Here is a small sample:
minutes | hour
718 11
719 11
721 12
722 12
723 12
779 12
781 13
782 13
What I need to do is have a table that calculates all the minute intervals and have the hour as the columns like this:
11 | 12 | 13
2 60 2
First problem is the missing values. I will need to add the values (720, 11), (720,12), (780,12), (780, 13) in order to calculate the minute intervals for each hour. Once I add those values, I can do a group by hour and find the difference between the minimum minutes and maximum minutes. I can then do a pivot by hour.
Any ideas with appending those values without using loops or hardcoding? I just need to have this as the output.
minutes | hour
718 11
719 11
720* 11
720* 12
721 12
722 12
723 12
779 12
780* 12
780* 13
781 13
782 13
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
