'How to identify changes name in dplyr() groups?
I am trying to figure out how to identify name changes within a group.
For example, I have a dataframe that looks like this:
df <- data.frame(
state = rep(c("CA", "WI", "NY"), each = 3),
year = rep(c(2000, 2001), each = 9),
name = c("John", "Paul", "Sally",
"Mary", "Fred", "Jane",
"Linda", "Carl", "Jim",
"Peter", "Paul", "Sally",
"Mary", "Kate", "Jane",
"Linda", "Carl", "Jim")
)
> df
state year name
1 CA 2000 John
2 CA 2000 Paul
3 CA 2000 Sally
4 WI 2000 Mary
5 WI 2000 Fred
6 WI 2000 Jane
7 NY 2000 Linda
8 NY 2000 Carl
9 NY 2000 Jim
10 CA 2001 Peter
11 CA 2001 Paul
12 CA 2001 Sally
13 WI 2001 Mary
14 WI 2001 Kate
15 WI 2001 Jane
16 NY 2001 Linda
17 NY 2001 Carl
18 NY 2001 Jim
As you can see, "Peter" replaced "John" in 2001, and "Kate" replaced "Fred" in 2001.
So I want the output to look like:
df <- data.frame(
state = rep(c("CA", "WI", "NY"), each = 3),
year = rep(c(2000, 2001), each = 9),
name = c("John", "Paul", "Sally",
"Mary", "Fred", "Jane",
"Linda", "Carl", "Jim",
"Peter", "Paul", "Sally",
"Mary", "Kate", "Jane",
"Linda", "Carl", "Jim"),
change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
1, 0, 0, 0, 1, 0, 0, 0, 0)
)
> df
state year name change
1 CA 2000 John NA
2 CA 2000 Paul NA
3 CA 2000 Sally NA
4 WI 2000 Mary NA
5 WI 2000 Fred NA
6 WI 2000 Jane NA
7 NY 2000 Linda NA
8 NY 2000 Carl NA
9 NY 2000 Jim NA
10 CA 2001 Peter 1
11 CA 2001 Paul 0
12 CA 2001 Sally 0
13 WI 2001 Mary 0
14 WI 2001 Kate 1
15 WI 2001 Jane 0
16 NY 2001 Linda 0
17 NY 2001 Carl 0
18 NY 2001 Jim 0
As you can see, Peter in 2001 and Kate in 2001 are both marked as "1" in the "change" column because they replaced "John" and "Fred" in 2000-CA and 2000-NY, respectively.
I've been looking at using some lag methods, but it seems to just look at the previous row, not by state, year groups:
df2 <- df %>%
group_by(state, year) %>%
mutate(change = lag(name, order_by = year))
Any help would be appreciated!
Solution 1:[1]
A base R approach that leaves out NAs
df2 <- split(df, df$year)
cbind(df, change=rep((!(df2$"2000"$name == df2$"2001"$name))*1, length(df2)))
state year name change
1 CA 2000 John 1
2 CA 2000 Paul 0
3 WI 2000 Sally 0
4 WI 2000 Mary 0
5 NY 2000 Fred 1
6 NY 2000 Jane 0
7 CA 2000 Linda 0
8 CA 2000 Carl 0
9 WI 2000 Jim 0
10 WI 2001 Peter 1
11 NY 2001 Paul 0
12 NY 2001 Sally 0
13 CA 2001 Mary 0
14 CA 2001 Kate 1
15 WI 2001 Jane 0
16 WI 2001 Linda 0
17 NY 2001 Carl 0
18 NY 2001 Jim 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andre Wildberg |
