'How can I replace a portion of text in a dataframe?
I am trying to replace part of the text in my data frame with different text. Under 'Treatment' I need to replace where it says 'iso1' with other text/or a different isolate number. However, I need to keep dilLB how it is since that is my control. I have thought about splitting what is in my treatment column and making isolate number a new column, but I think that may be more difficult than replacing these values.
Absorbance_t0 Absorbance_t1 row plateColumn Treatment Avg_t1 Avg_t0 norm_t0
1 1.163 0.388 A 1 dilLB 0.3626667 1.191667 1.191667
2 1.204 0.377 A 2 dilLB 0.3626667 1.191667 1.191667
3 1.208 0.323 A 3 dilLB 0.3626667 1.191667 1.191667
4 1.193 0.352 A 4 iso1_fullLB 0.4366667 1.219667 1.219667
5 1.235 0.438 A 5 iso1_fullLB 0.4366667 1.219667 1.219667
6 1.231 0.520 A 6 iso1_fullLB 0.4366667 1.219667 1.219667
I have tried df[df == "iso1"] <- "iso22" and I don't get an error, but it does not replace what I need it to.
Solution 1:[1]
You can also use gsub from base R. Essentially, you can replace everything before an underscore with "iso22".
df$Treatment <- gsub(".*_", "iso22_", df$Treatment)
Output
Treatment
1 dilLB
2 dilLB
3 dilLB
4 iso22_fullLB
5 iso22_fullLB
6 iso22_fullLB
7 iso22_fullLB
However, if you have other underscores in the column and only want to replace on ones that have "iso1_", then you can be explicit with the text. This will only replace that specific occurrence.
df$Treatment <- gsub("^iso1_", "iso22_", df$Treatment)
Output
Treatment
1 dilLB
2 dilLB
3 dilLB
4 iso22_fullLB
5 iso22_fullLB
6 iso22_fullLB
7 iso298_fullLB
Another option using tidyverse for separating them into two columns is to use separate. Here, I use _ to separate into 2 columns and use fill = left in order to push non-isolates into the B column. The mutate statement is for if you only want to keep the numbers for the Isolate column.
library(tidyverse)
df %>%
separate(Treatment, c("Isolate","B"), sep = "_", fill = "left") %>%
mutate(Isolate = as.numeric(str_extract(Isolate, "[0-9]+")))
Output
Isolate B
1 NA dilLB
2 NA dilLB
3 NA dilLB
4 1 fullLB
5 1 fullLB
6 1 fullLB
7 298 fullLB
Data
df <-
structure(list(
Treatment = c(
"dilLB",
"dilLB",
"dilLB",
"iso1_fullLB",
"iso1_fullLB",
"iso1_fullLB",
"iso298_fullLB"
)
),
class = "data.frame",
row.names = c(NA,-7L))
Solution 2:[2]
I would use stringr and dplyr. You need to manipulate the dataframe column with the text in it. Your code looks for cells in the dataframe that exactly match "iso1" so that's why it doesn't find any and do anything.
library(stringr)
library(dplyr)
df <- df %>%
mutate(
firstbit = str_extract(Treatment, "[:alnum:]+(?=_)"),
secondbit = str_extract(Treatment, "[:alnum:]+$")
)
This uses regular expressions to handle the text. It's explained in the stringr cheatsheet. "[:alnum:]+" means one or more letters or numbers, and "(?=_)" means followed by an underscore. "$" means followed by the end of the string.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Simon Woodward |
