'r Remove parts of column name after certain characters
I have a large data set with thousands of columns. The column names include various unwanted characters as follows:
col1_3x_xxx
col2_3y_xyz
col3_3z_zyx
I would like to remove all character strings starting with "_3" from all column names to be left with clean:
col1
col2
col3
What is the most efficient way to do this for 5000+ columns?
Solution 1:[1]
We can use sub
sub("_3.*", "", df1[,1])
#[1] "col1" "col2" "col3"
Solution 2:[2]
certainly late for this answer, but just in case someone is looking for a solution
colnames(df1)[col] <- sub("_3.*", "", colnames(df1)[col])
And if you have multiple columns :
for ( col in 1:ncol(df1)){
colnames(df1)[col] <- sub("_3.*", "", colnames(df1)[col])
}
Solution 3:[3]
We can try the str_extract
with regular expression pattern "^[^_]+(?=_)"
:
stringr::str_extract(c("col1_3x_xxx", "col2_3y_xyz", "col3_3z_zyx"), "^[^_]+(?=_)")
[1] "col1" "col2" "col3"
where in the pattern:
The first
^
matches the beginning of the string;[^_]+
matches one or more non_
character,^_
means any character but_
.(?=...)
stands for lookahead, so we are looking for pattern ahead of_
.
Solution 4:[4]
You can use
names(df) = gsub(pattern = "_3*", replacement = "", x = names(df))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | akrun |
Solution 2 | Rene Chan |
Solution 3 | Psidom |
Solution 4 | dare_devils |