'How to use gsub() to partially match part of a string and replace all strings containing that partial match?
I am trying to look for the word 'dunk' in a column of NBA shot types as there are various types of dunks.
Then replace all the different dunk types with just the word 'dunk'
nbaClean2 <- nbaData %>%
select(x, y, points, type, result, team, player) %>%
filter(team == 'LAL', y <= 47) %>%
na_if("") %>%
drop_na() %>%
mutate(result = ifelse(result == 'missed', 'FGA', 'FGM')) %>%
mutate(dunk = gsub('\bdunk', 'dunk', type))
mutate(dunk = grepl('dunk', type), gsub('TRUE', 'dunk', dunk))
Essentially trying to see if I can use gsub() to partial match the word 'dunk'.
Trying to find a solution where I don't have to write all different types of shot types down if possible as there is a lot.
Solution 1:[1]
gsub will only replace the match. You can make the match match the full string by wrapping it in .*, which matches anything.
mutate(dunk = gsub('.*dunk.*', 'dunk', type))
(Note I got rid of your \b word boundary. If you want to use it, you'd need a double backslash ".*\\bdunk.*", but then you wouldn't match anything that doesn't have a word boundary before dunk, e.g., "slamdunk" would not match.)
A potentially more efficient option would be to detect the "dunk" pattern and then replace the whole string without regex, e.g.
mutate(dunk = ifelse(grepl("dunk", type), "dunk", "not dunk"))
It's not clear what value you want in the newly created dunk column when type doesn't include "dunk". I'd consider making it a logical column simply with dunk = grepl("dunk", type). If you post sample input and desired output it much easier to help. Perhaps you don't want a dunk column at all, but just to change type to "dunk" if it includes the word "dunk", like this: mutate(type = ifelse(grepl("dunk", type), "dunk", type)).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
