'Check if a string value of a column in a Pandas DataFrame starts with the value of another column
I am trying to concatenate two string columns col1 and col2 in a Pandas DataFrame. But I do not want to concatenate them if col2’s value already begins with the value of col1. In this case I want to use col2 without concatenation. This would be the expected behavior:
| col1 | col2 | result |
|---|---|---|
| ABC | ABC | ABC |
| ABC | XYC | ABCXYZ |
| ABC | abc123 | abc123 |
I tried this code:
import pandas as pd
df = pd.DataFrame({
'col1': ['ABC', 'ABC', 'AbC'],
'col2': ['ABC', 'XYZ', 'abc123'],
})
df['result'] = df['col2'].where(df['col2'].str.lower().str.startswith(df['col1'].str.lower()), df['col1'] + df['col2'])
df
But that results in:
| col1 | col2 | result |
|---|---|---|
| ABC | ABC | ABCABC |
| ABC | XYC | ABCXYZ |
| ABC | abc123 | AbCabc123 |
For testing purposes I used a string literal as parameter for startswith and received the expected results:
df['result'] = df['col2'].where(df['col2'].str.lower().str.startswith('abc'), df['col1'] + df['col2'])
I found out that the result of the startswith function always returns NaN:
df['result'] = df['col2'].str.lower().str.startswith(df['col1'].str.lower())
| col1 | col2 | result |
|---|---|---|
| ABC | ABC | NaN |
| ABC | XYC | NaN |
| ABC | abc123 | NaN |
If I replace the startswith parameter with a string literal I receive booleans as expected:
df['result'] = df['col2'].str.lower().str.startswith('abc')
| col1 | col2 | result |
|---|---|---|
| ABC | ABC | True |
| ABC | XYC | False |
| ABC | abc123 | True |
I understand that it seems to be a problem to use a series as a parameter in the startswith function. But I do not get it to work.
I am very new to Python and Pandas, I intensively used search engines and stackoverflow's search function before I created my first post. What do I have to change in my code to accomplish the desired behavior? Any help is highly appreciated. Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
