'Could not extract share price from DataFrame column using regular expression

d=df.key.str.extract(pat=r"\[0-9\]\*\[.\]\[0-9\]\[0-9\]",expand=True)
key                             value
 0   Date 2021-04-05 870.929688 Name: Adj Close,... SBILIFE.NS
 1   Date 2021-04-05 1386.009521 Name: Adj Close... SRTRANSFIN.NS 
 2   Date 2021-04-06 39.354458 Name: Adj Close, ... NACLIND.NS 
 3   Date 2021-04-06 1397.550781 Name: Adj Close... SHRIRAMCIT.NS 
 4   Date 2021-04-07 70.805176 Name: Adj Close, ... EDELWEISS.NS

Though the key column is shown above index, its basically the date column(2nd), I want to extract the share price which is there after date in each row and before Name (like 870.929688 in first row) my code is unable to extract data and giving error "pattern contains no capture groups"

please help

d=df4.value.str.extract(pat=r"\[0-9\]\*\[.\]\[0-9\]\[0-9\]",expand=True)


Solution 1:[1]

The following regex will capture the stock price in the first and only capture group:

\d+-\d+-\d+\s+(.*?)\s*Name

Solution 2:[2]

You can use for example

\b(\d+(?:\.\d+)?)\s+Name:

The pattern matches:

  • \b A word boundary to prevent a partial word match
  • ( Capture group 1 (for str.extract)
    • \d+(?:\.\d+)? Match 1+ digits with an optional decimal part
  • ) Close group 1
  • \s+Name: Match 1+ whitespace chars and Name:

See a regex101 demo

For example:

import pandas as pd

strings = [
    "Date 2021-04-05 870.929688 Name: Adj Close,... SBILIFE.NS",
    "Date 2021-04-05 1386.009521 Name: Adj Close... SRTRANSFIN.NS ",
    "Date 2021-04-06 39.354458 Name: Adj Close, ... NACLIND.NS ",
    "Date 2021-04-06 1397.550781 Name: Adj Close... SHRIRAMCIT.NS ",
    "Date 2021-04-07 70.805176 Name: Adj Close, ... EDELWEISS.NS"
]

df = pd.DataFrame(strings, columns=["value"])
df['price'] = df.value.str.extract(pat=r"\b(\d+(?:\.\d+)?)\s+Name:", expand=True)
print(df)

Output

                                               value        price
0  Date 2021-04-05 870.929688 Name: Adj Close,......   870.929688
1  Date 2021-04-05 1386.009521 Name: Adj Close......  1386.009521
2  Date 2021-04-06 39.354458 Name: Adj Close, ......    39.354458
3  Date 2021-04-06 1397.550781 Name: Adj Close......  1397.550781
4  Date 2021-04-07 70.805176 Name: Adj Close, ......    70.805176

Solution 3:[3]

Use this, mind the parentheses:

df["price"]=df["value"].str.extract(r"([0-9]*\.?[0-9]+)\sName:", expand=False)

See regex proof.

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [0-9]*                   any character of: '0' to '9' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \.?                      '.' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  Name:                    'Name:'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 djplaistow
Solution 2 The fourth bird
Solution 3 Ryszard Czech