'r Remove parts of column name after certain characters

I have a large data set with thousands of columns. The column names include various unwanted characters as follows:

col1_3x_xxx
col2_3y_xyz
col3_3z_zyx

I would like to remove all character strings starting with "_3" from all column names to be left with clean:

col1
col2
col3

What is the most efficient way to do this for 5000+ columns?

r


Solution 1:[1]

We can use sub

sub("_3.*", "", df1[,1])
#[1] "col1" "col2" "col3"

Solution 2:[2]

certainly late for this answer, but just in case someone is looking for a solution

colnames(df1)[col] <-  sub("_3.*", "", colnames(df1)[col])

And if you have multiple columns :

for ( col in 1:ncol(df1)){
    colnames(df1)[col] <-  sub("_3.*", "", colnames(df1)[col])
}

Solution 3:[3]

We can try the str_extract with regular expression pattern "^[^_]+(?=_)":

stringr::str_extract(c("col1_3x_xxx", "col2_3y_xyz", "col3_3z_zyx"), "^[^_]+(?=_)")
[1] "col1" "col2" "col3"

where in the pattern:

The first ^ matches the beginning of the string; [^_]+ matches one or more non _ character, ^_ means any character but _. (?=...) stands for lookahead, so we are looking for pattern ahead of _.

Solution 4:[4]

You can use

names(df) = gsub(pattern = "_3*", replacement = "", x = names(df))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 akrun
Solution 2 Rene Chan
Solution 3 Psidom
Solution 4 dare_devils