'Changing Duplicate Values Within Subjects: R
My data looks like this:
| Country | GDP | Year |
|---|---|---|
| A | 10 | 1972 |
| A | 15 | 1973 |
| A | 20 | 1973 |
| A | 18 | 1975 |
| B | 25 | 1950 |
| B | 30 | 1951 |
| B | 35 | 1951 |
| B | 36 | 1953 |
I have so many observations look like data that I presented above. I want to change the duplicated years. However, I want to change first duplicated row of the year variable. I want to see my data like this:
| Country | GDP | Year |
|---|---|---|
| A | 10 | 1972 |
| A | 20 | 1973 |
| A | 15 | 1974 |
| A | 18 | 1975 |
| B | 25 | 1950 |
| B | 35 | 1951 |
| B | 30 | 1952 |
| B | 36 | 1953 |
Thank you for your time!
Solution 1:[1]
Here is one possible option with tidyverse:
library(tidyverse)
df %>%
group_by(Country, Year) %>%
mutate(dup = case_when(n() == 1 ~ FALSE,
min(GDP) == GDP ~ TRUE,
TRUE ~ FALSE)) %>%
mutate(Year = ifelse(dup == TRUE, Year + 1, Year)) %>%
arrange(Country, Year) %>%
ungroup %>%
select(-dup)
Output
Country GDP Year
<chr> <int> <dbl>
1 A 10 1972
2 A 20 1973
3 A 15 1974
4 A 18 1975
5 B 25 1950
6 B 35 1951
7 B 30 1952
8 B 36 1953
Solution 2:[2]
How about this ?
library(dplyr)
df %>%
arrange(Country, Year) %>%
group_by(Country) %>%
mutate(Year = min(Year) + row_number() - 1) %>%
ungroup
# Country GDP Year
# <chr> <int> <dbl>
#1 A 10 1972
#2 A 15 1973
#3 A 20 1974
#4 A 18 1975
#5 B 25 1950
#6 B 30 1951
#7 B 35 1952
#8 B 36 1953
This increments every Year by 1 starting from minimum value in each Country.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AndrewGB |
| Solution 2 | Ronak Shah |
