'How to Extract Numbers from String Column in Pandas with decimal?
I need to extract Numbers from String Column.
df:
Product
tld los 16OZ
HSJ14 OZ
hqk 28.3 OZ
rtk .7 OZ
ahdd .92OZ
aje 0.22 OZ
I need to Extract Numbers from column "Product" along with Decimal.
df_Output:
Product Numbers
tld los 16OZ 16
HSJ14 OZ 14
hqk 28.3 OZ 28.3
rtk .7 OZ 0.7
ahdd .92OZ 0.92
aje 0.22 OZ 0.22
what i tried:
df['Numbers'] = df['Product'].str.extract('([0-9]+[,./]*[0-9]*)') -- Missing Values like .7
Solution 1:[1]
If you want to match the numbers followed by OZ You could write the pattern as:
(\d*\.?\d+)\s*OZ\b
Explanation
(Capture group 1 (the value will be picked up be str.extract)\d*\.?\d+Match optional digits, optional dot and 1+ digits)Close group 1\s*OZ\bMatch optional whitspace chars and thenOZfollowed by a word boundary
See a regex demo.
import pandas as pd
data= [
"tld los 16OZ",
"HSJ14 OZ",
"hqk 28.3 OZ",
"rtk .7 OZ",
"ahdd .92OZ",
"aje 0.22 OZ"
]
df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] = df['Product'].str.extract(r'(\d*\.?\d+)\s*OZ\b')
print(df)
Output
Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22
Solution 2:[2]
If as simplified as presented, replace every other string except digit and dot
df['Numbers'] =df['Product'].str.replace('[^\d\.]','', regex=True).astype(float)
Product Numbers
0 tld los 16OZ 16.00
1 HSJ14 OZ 14.00
2 hqk 28.3 OZ 28.30
3 rtk .7 OZ 0.70
4 ahdd .92OZ 0.92
5 aje 0.22 OZ 0.22
Solution 3:[3]
You can use this regex:
df['Numbers'] = df['Product'].str.extract(r'(\d*\.\d+|\d+)', expand=False)
Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | The fourth bird |
| Solution 2 | wwnde |
| Solution 3 | k_krylowicz |
