'Extract integer in a filename from complete path using split regex in Pandas
Given a df
df=pd.DataFrame(['/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__131147.png',
'/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__160565.png'])
I would like to extract only the integer just before the file extension.
The code below should answer the above objective
df['fname'] =df[0].apply(lambda x : os.path.split(x)[1])
df['f'] =df['fname'].apply(lambda x : x.split('__')[1].split('.png')[0])
df['f']=df['f'].astype(int)
However, I have the impression this can be achieve easily using pandas build-in split, such as below
df['f']=df[0].str.split(re.compile(r"__\d.jpg"), expand=True)
But, it seems nothing is being split. May I know what parameter not being set correctly?
Solution 1:[1]
You can use Series.str.extract:
df['num'] = df['f'].str.extract(r'_(\d+)\.[^.]+$', expand=False)
Details:
_- an underscore(\d+)- Capturing group 1 (this is the value returned bySeries.str.extract): one or more digits\.- a.char[^.]+- one or more chars other than a.char$- end of string
Python test:
import pandas as pd
df = pd.DataFrame({'f':['/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__131147.png',
'/home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__160565.png']})
df['num'] = df['f'].str.extract(r'_(\d+)\.[^.]+$', expand=False)
print(df.to_string())
Output:
f num
0 /home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__131147.png 131147
1 /home/dtest/Documents/user/exp/S1/test1/test3/sub5/file_2_F__160565.png 160565
Solution 2:[2]
Assuming 0 the name of your column (as in your example), you can use str.extract:
df[0].str.extract(r'(\d+)\.[^.]+$', expand=False)
output:
0 131147
1 160565
Name: 0, dtype: object
To assign to a new column:
df['f'] = df[0].str.extract(r'(\d+)\.[^.]+$')
Solution 3:[3]
def extract(values):
values = values.split('__') # cut at '__'
return int(values[-1].replace('.png','')) # take the last part en replace the .png
df[0].apply(extract)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
| Solution 2 | mozway |
| Solution 3 | rocheteau |
