'Referring to columns by name in for loop
I have a loop in R that loops over columns in a dataframe las_ref for those columns where the name matches a value in a vector las_names.
As the original las_ref dataframe has rows with duplicated incidentid values, I am aiming to extract the most common value in each of the columns in las_names for each incidentid, produce a new column and use a join to produce a dataframe las_ref3 with the most common values in every row.
Structure of las_ref
incidentid incident var1
001 abc 45
002 abc NA
002 NA 78
003 def 12
004 xyz NA
004 xyz 10
004 abc 10
Intended structure of las_ref3
incidentid incident var1 incident-new var1-new
001 abc 45 abc 45
002 abc NA abc 78
002 NA 78 abc 78
003 def 12 def 12
004 xyz NA xyz 10
004 xyz 10 xyz 10
004 abs 10 xyz 10
Below is the loop I am trying to use.
las_names <- c("incident","var1")
for(i in las_names) {
las_ref2 <- las_ref %>%
group_by(across(c(incidentid, paste(i)))) %>%
tally() %>%
filter(!is.na(paste(i))) %>%
arrange(incidentid, desc(n)) %>%
summarize(paste(i,"-new") = first(paste(i)))
las_ref3 <- las_ref %>%
left_join(select(las_ref2, incidentid, paste(i,"-new")), by = c("incidentid"))
}
Currently there seem to be two issues that I think revolve around the use of i to refer to the columns incident and var in the examples above.
The first is that the !is.na() function is not removing the NA values in the columns referred to by i, although this does not produce an error.
The second, which does produce an error, is the summarise line. I get an unexpected '=' in error when this line is run.
Neither issue occurs when I run the code outside of the loop, specifying column names individually - the result comes out as expected. As the dataset is fairly large, I was hoping to use the loop to avoid doing each column individually.
Solution 1:[1]
A version that avoids manual loops:
library(tidyverse)
las_ref <- tribble(~incidentid , ~incident, ~var1,
"001", "abc", 45,
"002", "abc", NA,
"002", NA, 78,
"003", "def", 12,
"004", "xyz", NA,
"004", "xyz", 10,
"004", "abc", 10
)
las_ref3 <- las_ref %>%
group_by(incidentid) %>%
mutate(
across(c(incident, var1), ~names(sort(table(.x, useNA = 'no'), decreasing = T))[1], .names = '{.col}-new' )
)
incidentid incident var1 incident-new var1-new
<chr> <chr> <dbl> <chr> <chr>
1 001 abc 45 abc 45
2 002 abc NA abc 78
3 002 NA 78 abc 78
4 003 def 12 def 12
5 004 xyz NA xyz 10
6 004 xyz 10 xyz 10
7 004 abc 10 xyz 10
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jdobres |
