'Pandas dataframe conditional inner join with itself
I am searching for a way to inner join a column of a dataframe with itself, based on a condition. I have a large dataframe consisting of two colums, 'Group' and 'Person'. Now I would like to create a second dataframe, which has an entry for every person tuple, that has been in the same group. First dataframe:
Group | Person
a1 | p1
a1 | p2
a1 | p3
a1 | p4
a2 | p1
Output:
Person1 | Person2 | Weight
p1 | p2 | 1
p1 | p3 | 1
p1 | p4 | 1
p2 | p3 | 1
p2 | p4 | 1
p3 | p4 | 1
The weight is increased, if a tuple of persons are part of multiple groups. So far, I was able to create a naive implementation, based on a sub dataframe and two for loops. Is there a more elegant and more importantly, a faster/builtin way to do so ?
My implentation so far:
group = principals.iloc[i,0]
sub = principals.loc[principals['Group'] == group]
for j in range(len(sub)-1):
for k in range (j+1,len(sub)):
#check if tuple exists -> update or create new entry
I was thinking, whether there is a functionality similar to SQL inner join, based on the condition of the group being the same and then joining person against person. I could take care of the double p1|p1 entry in that case...
Many thanks in advance
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
