'How to extend/complete the dates in the Date column of a table? Using dplyr/sparklyr

I have this data with missing dates in the Date column, how can I expand this column to the full range of dates since the start and end of each grouping? The grouping are the columns Material and Category enter image description here

How do you add the missing dates for each grouping of Material, Category from the first to the last date in each group? Don't mind the Value column, I can fill those up with fill().

I tried to

group_by(Material,Category) %>% arrange(Date) %>% complete(Date = seq.Date(min(Date),max(Date),by="days")) 

But does not seems to work in SPARK.



Solution 1:[1]

No experience with SPARK, but try this:


library(lubridate)
library(dplyr)
library(tidyr)

df %>% 
  group_by(Category) %>% 
  mutate(Date = dmy(Date)) %>% 
  arrange(Date, .by_group = TRUE) %>% 
  complete(Date = seq.Date(min(Date),max(Date),by="days")) %>% 
  fill(Material, .direction = "down")

   Category       Date Material Value
1         A 2022-01-03     Iron    39
2         A 2022-01-04     Iron    45
3         A 2022-01-05     Iron    39
4         A 2022-01-06     Iron    39
5         A 2022-01-07     Iron    24
6         A 2022-01-08     Iron    NA
7         A 2022-01-09     Iron    NA
8         A 2022-01-10     Iron    35
9         A 2022-01-11     Iron    37
10        B 2022-02-03    Steel    45
11        B 2022-02-04    Steel    21
12        B 2022-02-05    Steel    NA
13        B 2022-02-06    Steel    NA
14        B 2022-02-07    Steel    34
15        B 2022-02-08    Steel    45
16        B 2022-02-09    Steel    NA
17        B 2022-02-10    Steel    NA
18        B 2022-02-11    Steel    67
19        C 2022-01-19   Copper    21
20        C 2022-01-20   Copper    NA
21        C 2022-01-21   Copper    NA
22        C 2022-01-22   Copper    NA
23        C 2022-01-23   Copper    NA
24        C 2022-01-24   Copper    33
25        C 2022-01-25   Copper    12
26        C 2022-01-26   Copper    NA
27        C 2022-01-27   Copper    NA
28        C 2022-01-28   Copper    NA
29        C 2022-01-29   Copper    NA
30        C 2022-01-30   Copper    NA
31        C 2022-01-31   Copper    NA
32        C 2022-02-01   Copper    NA
33        C 2022-02-02   Copper    NA
34        C 2022-02-03   Copper    NA
35        C 2022-02-04   Copper    NA
36        C 2022-02-05   Copper    NA
37        C 2022-02-06   Copper    NA
38        C 2022-02-07   Copper    NA
39        C 2022-02-08   Copper    NA
40        C 2022-02-09   Copper    NA
41        C 2022-02-10   Copper    NA
42        C 2022-02-11   Copper    NA
43        C 2022-02-12   Copper    NA
44        C 2022-02-13   Copper    NA
45        C 2022-02-14   Copper    NA
46        C 2022-02-15   Copper    NA
47        C 2022-02-16   Copper    NA
48        C 2022-02-17   Copper    NA
49        C 2022-02-18   Copper    NA
50        C 2022-02-19   Copper    NA
51        C 2022-02-20   Copper    NA
52        C 2022-02-21   Copper    NA
53        C 2022-02-22   Copper    NA
54        C 2022-02-23   Copper    NA
55        C 2022-02-24   Copper    NA
56        C 2022-02-25   Copper    NA
57        C 2022-02-26   Copper    NA
58        C 2022-02-27   Copper    NA
59        C 2022-02-28   Copper    90
60        C 2022-03-01   Copper    NA
61        C 2022-03-02   Copper    NA
62        C 2022-03-03   Copper    80

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TarJae