'Values from two columns appear as a single column in another dataframe
One dataframe looks like this:
Thing Number1 Number2
43 STK
64 BOX
32 STK BOX
46 THG
34 BOX THG
...
And another one like this:
Thing Package Value1 Value2
43 STK 1 1
64 BOX 2 1
64 STK 1 1
32 STK 1 1
32 BOX 2 3
42 THG 1 1
34 BOX 1 1
34 THG 1 3
34 STK 2 1
...
I want to eliminate the rows in the second dataframe that do not appear in the first (like for 64 I only have BOX). I merged the two and eliminated the rows where the value in Package could not be found in either of Number1 or Number2, but is there any more efficient way of doing it?
Solution 1:[1]
I think your solution is efficient.
df = (df2.merge(df1, on='Thing')
.query("Package == Number1 | Package == Number2")
.drop(['Number1','Number2'], axis=1))
print (df)
Thing Package Value1 Value2
0 43 STK 1 1
1 64 BOX 2 1
3 32 STK 1 1
4 32 BOX 2 3
5 34 BOX 1 1
6 34 THG 1 3
Added alternative for multiple Number columns in df1.
Use DataFrame.merge with DataFrame.melt:
df = df2.merge(df1.melt('Thing', value_name='Package')[['Thing','Package']])
print (df)
Thing Package Value1 Value2
0 43 STK 1 1
1 64 BOX 2 1
2 32 STK 1 1
3 32 BOX 2 3
4 34 BOX 1 1
5 34 THG 1 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
