'How can I replace a portion of text in a dataframe?

I am trying to replace part of the text in my data frame with different text. Under 'Treatment' I need to replace where it says 'iso1' with other text/or a different isolate number. However, I need to keep dilLB how it is since that is my control. I have thought about splitting what is in my treatment column and making isolate number a new column, but I think that may be more difficult than replacing these values.

 Absorbance_t0 Absorbance_t1 row plateColumn   Treatment    Avg_t1   Avg_t0  norm_t0
1         1.163         0.388   A           1       dilLB 0.3626667 1.191667 1.191667
2         1.204         0.377   A           2       dilLB 0.3626667 1.191667 1.191667
3         1.208         0.323   A           3       dilLB 0.3626667 1.191667 1.191667
4         1.193         0.352   A           4 iso1_fullLB 0.4366667 1.219667 1.219667
5         1.235         0.438   A           5 iso1_fullLB 0.4366667 1.219667 1.219667
6         1.231         0.520   A           6 iso1_fullLB 0.4366667 1.219667 1.219667

I have tried df[df == "iso1"] <- "iso22" and I don't get an error, but it does not replace what I need it to.

r replace rename

Solution 1:^[1]

You can also use gsub from base R. Essentially, you can replace everything before an underscore with "iso22".

df$Treatment <- gsub(".*_", "iso22_", df$Treatment)

Output

     Treatment
1        dilLB
2        dilLB
3        dilLB
4 iso22_fullLB
5 iso22_fullLB
6 iso22_fullLB
7 iso22_fullLB

However, if you have other underscores in the column and only want to replace on ones that have "iso1_", then you can be explicit with the text. This will only replace that specific occurrence.

df$Treatment <- gsub("^iso1_", "iso22_", df$Treatment)

Output

      Treatment
1         dilLB
2         dilLB
3         dilLB
4  iso22_fullLB
5  iso22_fullLB
6  iso22_fullLB
7 iso298_fullLB

Another option using tidyverse for separating them into two columns is to use separate. Here, I use _ to separate into 2 columns and use fill = left in order to push non-isolates into the B column. The mutate statement is for if you only want to keep the numbers for the Isolate column.

library(tidyverse)

df %>% 
  separate(Treatment, c("Isolate","B"), sep = "_", fill = "left") %>% 
  mutate(Isolate = as.numeric(str_extract(Isolate, "[0-9]+")))

Output

  Isolate      B
1      NA  dilLB
2      NA  dilLB
3      NA  dilLB
4       1 fullLB
5       1 fullLB
6       1 fullLB
7     298 fullLB

Data

df <-
  structure(list(
    Treatment = c(
      "dilLB",
      "dilLB",
      "dilLB",
      "iso1_fullLB",
      "iso1_fullLB",
      "iso1_fullLB",
      "iso298_fullLB"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-7L))

Solution 2:^[2]

I would use stringr and dplyr. You need to manipulate the dataframe column with the text in it. Your code looks for cells in the dataframe that exactly match "iso1" so that's why it doesn't find any and do anything.

library(stringr)
library(dplyr)

df <- df %>%
  mutate(
    firstbit = str_extract(Treatment, "[:alnum:]+(?=_)"),
    secondbit = str_extract(Treatment, "[:alnum:]+$")
  )

This uses regular expressions to handle the text. It's explained in the stringr cheatsheet. "[:alnum:]+" means one or more letters or numbers, and "(?=_)" means followed by an underscore. "$" means followed by the end of the string.

https://stringr.tidyverse.org/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Simon Woodward

'How can I replace a portion of text in a dataframe?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]