'I want to do a conditionning filter in some row to extract values
I have a DataFrame with the first columns :
Brand Price
Samsung 75.5,99.2
Lenovo 50.3
Lg 60.5,88.4
I want to extract the max value if the cell contains 2 values. I try the conditioning that follows:
for x in df['Price']:
if 0 <= len(x) <= 5:
df['Price'] = df['Price'].str[0:5]
else:
df['Price'] = df['Price'].str[6:12]
but it don't work for the cells that contains a single value it returns a blank cell.
Solution 1:[1]
We can str.split the column on the comma, then use apply to find the max value per row (each string needs converted to a float so that max will behave as expected):
df['Price'] = df['Price'].str.split(',').apply(lambda c: max(map(float, c)))
Alternatively expand into a DataFrame and convert the entire DataFrame to float with astype and get the max per row (axis=1):
df['Price'] = df['Price'].str.split(',', expand=True).astype(float).max(axis=1)
Assuming the comma separated values are sorted (the last value will always be the largest), we can also just use the str accessor to get the last value (index -1):
df['Price'] = df['Price'].str.split(',').str[-1]
All options produces output df (for the provide sample):
Brand Price
0 Samsung 99.2
1 Lenovo 50.3
2 Lg 88.4
Setup:
import pandas as pd
df = pd.DataFrame({
'Brand': ['Samsung', 'Lenovo', 'Lg'],
'Price': ['75.5,99.2', '50.3', '60.5,88.4']
})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
