'How can I change some of the names of variables within columns?
I have a dataset where i have a column (Continent) and i which to rename some of the data within this column, how would i do this. Data and example below;
I currently have these as the continents for countries in my dataset, i with to rename them so Australia would take Oceana instead of Western pacific, and Afghanistan would take Asia and not East Mediterranean. Africa Americas East Mediterranean Europe South East Asia Western Pacific
Part of my dataset here; head(all_data,3)
Country Year Continent Life_Expectancy
1 Afghanistan 2010 Eastern Mediterranean 61.17996
2 Afghanistan 2011 Eastern Mediterranean 61.72234
3 Afghanistan 2012 Eastern Mediterranean 62.20652
tail(all_data,1)
Country Year Continent Life_Expectancy
4705 Zimbabwe 2010 Africa 52.91785
Solution 1:[1]
Solution
library(data.table)
setDT(df)
df[Country == 'Afghanistan', Continent := 'Asia'
][Country == 'Australia', Continent := 'Oceana'
]
With any Country not covered by our logic above, Continent would keep its original value. Also note latter statements take precedence.
Benchmark
The advantage to this method is speed (scalability). In our benchmark with 20 million rows data.table performed > 4.5x more quickly:
# dummy data
x <- 1e7
df <- data.table(Country = rep(c('Afghanistan', 'Australia'), x)
, Continent = rep(c('x', 'y'), x)
)
# benchmark
library(dplyr)
library(data.table)
library(microbenchmark)
library(ggplot2)
xx <-
microbenchmark(dplyr_case = {df %>%
mutate(Continent = case_when(Country == "Afghanistan" ~ "Asia"
, Country == "Australia" ~ "Oceana"
, TRUE ~ Continent
)
)
}
, dt_subset = {df[Country == 'Afghanistan', Continent := 'Asia'
][Country == 'Australia', Continent := 'Oceana'
]
}
, times = 10
)
# plot
autoplot(xx)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

