'Could not extract share price from DataFrame column using regular expression
d=df.key.str.extract(pat=r"\[0-9\]\*\[.\]\[0-9\]\[0-9\]",expand=True)
key value
0 Date 2021-04-05 870.929688 Name: Adj Close,... SBILIFE.NS
1 Date 2021-04-05 1386.009521 Name: Adj Close... SRTRANSFIN.NS
2 Date 2021-04-06 39.354458 Name: Adj Close, ... NACLIND.NS
3 Date 2021-04-06 1397.550781 Name: Adj Close... SHRIRAMCIT.NS
4 Date 2021-04-07 70.805176 Name: Adj Close, ... EDELWEISS.NS
Though the key column is shown above index, its basically the date column(2nd), I want to extract the share price which is there after date in each row and before Name (like 870.929688 in first row) my code is unable to extract data and giving error "pattern contains no capture groups"
please help
d=df4.value.str.extract(pat=r"\[0-9\]\*\[.\]\[0-9\]\[0-9\]",expand=True)
Solution 1:[1]
The following regex will capture the stock price in the first and only capture group:
\d+-\d+-\d+\s+(.*?)\s*Name
Solution 2:[2]
You can use for example
\b(\d+(?:\.\d+)?)\s+Name:
The pattern matches:
\bA word boundary to prevent a partial word match(Capture group 1 (for str.extract)\d+(?:\.\d+)?Match 1+ digits with an optional decimal part
)Close group 1\s+Name:Match 1+ whitespace chars andName:
See a regex101 demo
For example:
import pandas as pd
strings = [
"Date 2021-04-05 870.929688 Name: Adj Close,... SBILIFE.NS",
"Date 2021-04-05 1386.009521 Name: Adj Close... SRTRANSFIN.NS ",
"Date 2021-04-06 39.354458 Name: Adj Close, ... NACLIND.NS ",
"Date 2021-04-06 1397.550781 Name: Adj Close... SHRIRAMCIT.NS ",
"Date 2021-04-07 70.805176 Name: Adj Close, ... EDELWEISS.NS"
]
df = pd.DataFrame(strings, columns=["value"])
df['price'] = df.value.str.extract(pat=r"\b(\d+(?:\.\d+)?)\s+Name:", expand=True)
print(df)
Output
value price
0 Date 2021-04-05 870.929688 Name: Adj Close,...... 870.929688
1 Date 2021-04-05 1386.009521 Name: Adj Close...... 1386.009521
2 Date 2021-04-06 39.354458 Name: Adj Close, ...... 39.354458
3 Date 2021-04-06 1397.550781 Name: Adj Close...... 1397.550781
4 Date 2021-04-07 70.805176 Name: Adj Close, ...... 70.805176
Solution 3:[3]
Use this, mind the parentheses:
df["price"]=df["value"].str.extract(r"([0-9]*\.?[0-9]+)\sName:", expand=False)
See regex proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[0-9]* any character of: '0' to '9' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
Name: 'Name:'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | djplaistow |
| Solution 2 | The fourth bird |
| Solution 3 | Ryszard Czech |
