'How can I change some of the names of variables within columns?

I have a dataset where i have a column (Continent) and i which to rename some of the data within this column, how would i do this. Data and example below;

I currently have these as the continents for countries in my dataset, i with to rename them so Australia would take Oceana instead of Western pacific, and Afghanistan would take Asia and not East Mediterranean. Africa Americas East Mediterranean Europe South East Asia Western Pacific

Part of my dataset here; head(all_data,3)

     Country Year             Continent Life_Expectancy 
1 Afghanistan 2010 Eastern Mediterranean        61.17996                 
2 Afghanistan 2011 Eastern Mediterranean        61.72234       
3 Afghanistan 2012 Eastern Mediterranean        62.20652        

tail(all_data,1)

      Country Year Continent Life_Expectancy 
4705 Zimbabwe 2010    Africa        52.91785          
r


Solution 1:[1]

Solution

library(data.table)

setDT(df)

df[Country == 'Afghanistan', Continent := 'Asia'
   ][Country == 'Australia', Continent := 'Oceana'
     ]

With any Country not covered by our logic above, Continent would keep its original value. Also note latter statements take precedence.

Benchmark

The advantage to this method is speed (scalability). In our benchmark with 20 million rows data.table performed > 4.5x more quickly:

# dummy data
x <- 1e7

df <- data.table(Country = rep(c('Afghanistan', 'Australia'), x)
                 , Continent = rep(c('x', 'y'), x)
                 )

# benchmark
library(dplyr)
library(data.table)
library(microbenchmark)
library(ggplot2)

xx <-
microbenchmark(dplyr_case = {df %>%
                                mutate(Continent = case_when(Country == "Afghanistan" ~ "Asia"
                                                             , Country == "Australia" ~ "Oceana"
                                                             , TRUE ~ Continent
                                                             )
                                        )
                              }
               , dt_subset = {df[Country == 'Afghanistan', Continent := 'Asia'
                                 ][Country == 'Australia', Continent := 'Oceana'
                                   ]
                              }
               , times = 10
               )

# plot
autoplot(xx)

benchmark

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1