'Check if column contains (/,-,_, *or~) and split in another column - Pandas
I have a column with numbers and one of these characters between them -,/,*,~,_. I need to check if values contain any of the characters, then split the value in another column. Is there a different solution than shown below? In the end, columns subnumber1, subnumber2 ...subnumber5 will be merged in one column and column "number5" will be without characters. Those two columns I need to use in further process. I'm a newbie in Python so any advice is welcome.
if gdf['column_name'].str.contains('~').any():
gdf[['number1', 'subnumber1']] = gdf['column_name'].str.split('~', expand=True)
gdf
if gdf['column_name'].str.contains('^').any():
gdf[['number2', 'subnumber2']] = gdf['column_name'].str.split('^', expand=True)
gdf
Input column:
column_name
152/6*3
163/1-6
145/1
163/6^3
output:
number5 |subnumber1 |subnumber2
152 | 6 | 3
163 | 1 | 6
145 | 1 |
163 | 6 | 3
Solution 1:[1]
Use str.split:
df['column_name'].str.split(r'[*,-/^_]', expand=True)
output:
0 1 2
0 152 6 3
1 163 1 6
2 145 1 None
3 163 6 3
Or, if you know in advance that you have 3 numbers, use str.extract and named capturing groups:
regex = '(?P<number5>\d+)\D*(?P<subnumber1>\d*)\D*(?P<subnumber2>\d*)'
df['column_name'].str.extract(regex)
output:
number5 subnumber1 subnumber2
0 152 6 3
1 163 1 6
2 145 1
3 163 6 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
