'Merge multiple categorical variables into one using value from another value in r
I have data columns like:
Weather1 Weather.2 Weather3 Sunny Sunny NA Cloudy NA some cloud Hazy Hazy Hazy Warm NA Warm NA NA NA
Here is a snippet of code to recreate the data:
example = data.frame(Weather1 = c('Sunny','Cloudy','Hazy','Warm','NA'),
Weather.2 = c('Sunny','NA','Hazy','NA','NA'),
Weather3 = c('NA','some cloud','Hazy','NA', 'NA'))
I want to combine the three variables like the following:
Weather1 Weather.2 Weather3 combine_weather Sunny Sunny NA Sunny Cloudy NA some cloud Cloudy Hazy Hazy Hazy Hazy Warm NA NA Warm NA NA NA NA
I want to use value from first variable if there is any mismatch.
Solution 1:[1]
You can use coalesce to fill in NA horizontally with the first non-missing value.
Note I have changed your string "NA" into real NA first.
There are a few ways to specify columns in "tidy-select" style:
- If your columns that need to be acted on all start with the string "Weather", you can do
select(., starts_with("Weather")). - If they are ordered next to each other, you can do
select(., Weather1:Weather3). - Or select them manually
select(., Weather1, Weather.2, Weather3)
library(dplyr)
example %>% mutate(across(everything(), ~ifelse(.x == "NA", NA, .x)),
combine_weather = coalesce(!!!select(., everything())))
Weather1 Weather.2 Weather3 combine_weather
1 Sunny Sunny <NA> Sunny
2 Cloudy <NA> some cloud Cloudy
3 Hazy Hazy Hazy Hazy
4 Warm <NA> <NA> Warm
5 <NA> <NA> <NA> NA
Solution 2:[2]
Another way could be first define how your combine_weather could be in pattern:
then extract after unite:
library(tidyverse)
pattern <- c("Sunny|Cloudy|Hazy|Warm")
example %>%
na_if("NA") %>%
unite(combine_weather, starts_with("Weather"), sep = " ", na.rm = TRUE, remove = FALSE) %>%
mutate(combine_weather = str_extract(combine_weather, pattern))
combine_weather Weather1 Weather.2 Weather3
1 Sunny Sunny Sunny <NA>
2 Cloudy Cloudy <NA> some cloud
3 Hazy Hazy Hazy Hazy
4 Warm Warm <NA> <NA>
5 <NA> <NA> <NA> <NA>
Solution 3:[3]
Using data.table:
library(data.table)
setDT(example)[
, combined.weather:=fcoalesce(.SD)
, .SDcols=c('Weather1', 'Weather.2', 'Weather3')]
example
## Weather1 Weather.2 Weather3 combined.weather
## 1: Sunny Sunny <NA> Sunny
## 2: Cloudy <NA> some cloud Cloudy
## 3: Hazy Hazy Hazy Hazy
## 4: Warm <NA> <NA> Warm
## 5: <NA> <NA> <NA> <NA>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | TarJae |
| Solution 3 | jlhoward |
