'How to remove substring starting with RT and ends with ":"

I have a dataset with a column consisting of tweets. Some tweets are retweets, which start with RT @username: ..... I would like to remove this part of the string while keeping the string that comes after it.

See the example below:

stringsExample <- c("RT @WhiteHouse: Yesterday, President Biden...",
 "During World War II...")

The results I want are: Yesterday, President Biden... During World War II...



Solution 1:[1]

Replace anything that starts (regex ^) with "RT" followed by one or more characters (regex .+?), until a colon ":" with empty space "".

gsub("^RT.+?: ", "", stringsExample)

[1] "Yesterday, President Biden..." "During World War II..."   

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 benson23