'Find a specific values in column and take few next elements that are next to it
I have a problem with one task. I have a list of values that looks like below:
values = ["a","b","c"]
and my DF looks like below:
column_1 column_2
1 sffasdef
2 bsrewsaf
3 tyhvbsrc
4 ertyui1c
5 qwertyyu
I have to check if one of values in list exists in column 2. If there is, in new column it should return result and next 3 elements, so the DF should look like below:
column_1 column_2 column_3
1 sffasdef asde
2 bsrewsaf bsre
3 tyhvbsrc bsrc
4 ertyui1c c
5 qwertyyu NaN
Do you have any idea how to solve this? Regards
Solution 1:[1]
Use .str.extract:
df['column_3'] = df['column_2'].str.extract(f'((?:{"|".join(values)})(?:.?){{3}})')
# OR, possibly more readable
values_re = '|'.join(values)
df['column_3'] = df['column_2'].str.extract(r'((?:' + values_re + ')(?:.?){3})')
Output:
>>> df
column_1 column_2 column_3
0 1 sffasdef asde
1 2 bsrewsaf bsre
2 3 tyhvbsrc bsrc
3 4 ertyui1c c
4 5 qwertyyu NaN
Solution 2:[2]
Assuming you have single characters in values:
df['column_3'] = df['column_2'].str.extract(fr'([{"".join(values)}].{{,3}})')
output:
column_1 column_2 column_3
0 1 sffasdef asde
1 2 bsrewsaf bsre
2 3 tyhvbsrc bsrc
3 4 ertyui1c c
4 5 qwertyyu NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
| Solution 2 | mozway |
