'How to identify changes name in dplyr() groups?

I am trying to figure out how to identify name changes within a group.

For example, I have a dataframe that looks like this:

    df <- data.frame(
  state = rep(c("CA", "WI", "NY"), each = 3),
  year = rep(c(2000, 2001), each = 9),
  name = c("John", "Paul", "Sally",
           "Mary", "Fred", "Jane",
           "Linda", "Carl", "Jim",
           "Peter", "Paul", "Sally",
           "Mary", "Kate", "Jane",
           "Linda", "Carl", "Jim")
)

    > df
   state year  name
1     CA 2000  John
2     CA 2000  Paul
3     CA 2000 Sally
4     WI 2000  Mary
5     WI 2000  Fred
6     WI 2000  Jane
7     NY 2000 Linda
8     NY 2000  Carl
9     NY 2000   Jim
10    CA 2001 Peter
11    CA 2001  Paul
12    CA 2001 Sally
13    WI 2001  Mary
14    WI 2001  Kate
15    WI 2001  Jane
16    NY 2001 Linda
17    NY 2001  Carl
18    NY 2001   Jim

As you can see, "Peter" replaced "John" in 2001, and "Kate" replaced "Fred" in 2001.

So I want the output to look like:

df <- data.frame(
  state = rep(c("CA", "WI", "NY"), each = 3),
  year = rep(c(2000, 2001), each = 9),
  name = c("John", "Paul", "Sally",
           "Mary", "Fred", "Jane",
           "Linda", "Carl", "Jim",
           "Peter", "Paul", "Sally",
           "Mary", "Kate", "Jane",
           "Linda", "Carl", "Jim"),
  change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
             1, 0, 0, 0, 1, 0, 0, 0, 0)
)

   > df
   state year  name change
1     CA 2000  John     NA
2     CA 2000  Paul     NA
3     CA 2000 Sally     NA
4     WI 2000  Mary     NA
5     WI 2000  Fred     NA
6     WI 2000  Jane     NA
7     NY 2000 Linda     NA
8     NY 2000  Carl     NA
9     NY 2000   Jim     NA
10    CA 2001 Peter      1
11    CA 2001  Paul      0
12    CA 2001 Sally      0
13    WI 2001  Mary      0
14    WI 2001  Kate      1
15    WI 2001  Jane      0
16    NY 2001 Linda      0
17    NY 2001  Carl      0
18    NY 2001   Jim      0

As you can see, Peter in 2001 and Kate in 2001 are both marked as "1" in the "change" column because they replaced "John" and "Fred" in 2000-CA and 2000-NY, respectively.

I've been looking at using some lag methods, but it seems to just look at the previous row, not by state, year groups:

df2 <- df %>% 
  group_by(state, year) %>%
  mutate(change = lag(name, order_by = year))

Any help would be appreciated!



Solution 1:[1]

A base R approach that leaves out NAs

df2 <- split(df, df$year)

cbind(df, change=rep((!(df2$"2000"$name == df2$"2001"$name))*1, length(df2)))
   state year  name change
1     CA 2000  John      1
2     CA 2000  Paul      0
3     WI 2000 Sally      0
4     WI 2000  Mary      0
5     NY 2000  Fred      1
6     NY 2000  Jane      0
7     CA 2000 Linda      0
8     CA 2000  Carl      0
9     WI 2000   Jim      0
10    WI 2001 Peter      1
11    NY 2001  Paul      0
12    NY 2001 Sally      0
13    CA 2001  Mary      0
14    CA 2001  Kate      1
15    WI 2001  Jane      0
16    WI 2001 Linda      0
17    NY 2001  Carl      0
18    NY 2001   Jim      0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andre Wildberg