'Extract substring from left to a specific character for each row in a pandas dataframe?
I have a dataframe that contains a collection of strings. These strings look something like this:
"oop9-hg78-op67_457y"
I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:
df['column'] = df['column'].str[0:'_']
I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!
Solution 1:[1]
df['column'] = df['column'].str.extract('_', expand=False)
could also be used if another option is needed.
Adding to the solution provided above by @Ynjxsjmh
Solution 2:[2]
You can use str.extract:
df['column'] = df['column'df].str.extract(r'(^[^_]+)')
Output (as separate column for clarity):
column column2
0 oop9-hg78-op67_457y oop9-hg78-op67
Regex:
( # start capturing group
^ # match start of string
[^_]+ # one or more non-underscore
) # end capturing group
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Norwegian Salmon |
| Solution 2 |
