'Extract substring from left to a specific character for each row in a pandas dataframe?

I have a dataframe that contains a collection of strings. These strings look something like this:

"oop9-hg78-op67_457y"

I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:

df['column'] = df['column'].str[0:'_']

I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!



Solution 1:[1]

df['column'] = df['column'].str.extract('_', expand=False)

could also be used if another option is needed.

Adding to the solution provided above by @Ynjxsjmh

Solution 2:[2]

You can use str.extract:

df['column'] = df['column'df].str.extract(r'(^[^_]+)')

Output (as separate column for clarity):

                column         column2
0  oop9-hg78-op67_457y  oop9-hg78-op67

Regex:

(       # start capturing group
^       # match start of string
[^_]+   # one or more non-underscore
)       # end capturing group

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Norwegian Salmon
Solution 2