'Take row pairs in one pandas dataframe and iterate through another dataframe
I would like to iterate through row pairs of df_a, comparing firstname1 to lastname1 and firstname2 to lastname2. So for each row pair (Ex. firstname1 & lastname1), I would then like to iterate through df_b and identify non-overlapping coordinate ranges (coordinate_start, coordinate_end) for those names, for example the coordinate ranges assigned for firstname1 do not overlap with any of the coordinate ranges for lastname1.
Starting with df_a & df_b:
a = {'ID_a': ['firstname1', 'firstname2'], 'ID_b': ['lastname1', 'lastname2']}
df_a = pd.DataFrame(a)
b = {'coordinate_start' :[1,6,20,35,51,1,7,15,40,51,70,85,91,70,80,94], 'coordinate_end':[5,15,27,50,55,5,14,19,47,55,78,90,93,78,84,100],
'name': ['firstname1', 'firstname1','firstname1', 'firstname1','firstname1',
'lastname1','lastname1','lastname1','lastname1','lastname1',
'firstname2','firstname2','firstname2',
'lastname2','lastname2', 'lastname2'
]}
df_b = pd.DataFrame(b)
I would like to return df_c, which contains the non-overlapping coordinates and the name it's associated with:
c = {'unique_coordinate_start': [20,85,80,91,94],
'unique_coordinate_end': [27,90,84,93,100],
'name': ['firstname1','firstname2', 'lastname2','firstname2','lastname2']}
df_c = pd.DataFrame(c)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
