'How can I multiply two dataframes with different column labels in pandas?
I'm trying to multiply (add/divide/etc.) two dataframes that have different column labels.
I'm sure this is possible, but what's the best way to do it? I've tried using rename to change the columns on one df first, but (1) I'd rather not do that and (2) my real data has a multiindex on the columns (where only one layer of the multiindex is differently labeled), and rename seems tricky for that case...
So to try and generalize my question, how can I get df1 * df2 using map to define the columns to multiply together?
df1 = pd.DataFrame([1,2,3], index=['1', '2', '3'], columns=['a', 'b', 'c'])
df2 = pd.DataFrame([4,5,6], index=['1', '2', '3'], columns=['d', 'e', 'f'])
map = {'a': 'e', 'b': 'd', 'c': 'f'}
df1 * df2 = ?
Solution 1:[1]
I was also troubled by this problem. It seems that the pandas requires matrix multiply needs both dataframes has same column names.
I searched a lot and found the example in the setting enlargement is add one column to the dataframe.
For your question,
rs = pd.np.multiply(ds2, ds1)
The rs will have the same column names as ds2.
Suppose we want to multiply several columns with other serveral columns in the same dataframe and append these results into the original dataframe.
For example ds1,ds2 are in the same dataframe ds. We can
ds[['r1', 'r2', 'r3']] = pd.np.multiply(ds[['a', 'b', 'c']], ds[['d', 'e', 'f']])
I hope these will help.
Solution 2:[2]
Updated solution now that pd.np is being deprecated: df1.multiply(np.array(df2)
It will keep the column names of df1 and multiply them by the columns of df2 in order
Solution 3:[3]
I just stumbled onto the same problem. It seems like pandas wants both the column and row index to be aligned to do the element-wise multiplication, so you can just rename with your mapping during the multiplication:
>>> df1 = pd.DataFrame([[1,2,3]], index=['1', '2', '3'], columns=['a', 'b', 'c'])
>>> df2 = pd.DataFrame([[4,5,6]], index=['1', '2', '3'], columns=['d', 'e', 'f'])
>>> df1
a b c
1 1 2 3
2 1 2 3
3 1 2 3
>>> df2
d e f
1 4 5 6
2 4 5 6
3 4 5 6
>>> mapping = {'a' : 'e', 'b' : 'd', 'c' : 'f'}
>>> df1.rename(columns=mapping) * df2
d e f
1 8 5 18
2 8 5 18
3 8 5 18
If you want the 'natural' order of columns, you can create a mapping on the fly like:
>>> df1 * df2.rename(columns=dict(zip(df2.columns, df1.columns)))
for example to do the "Frobenius inner product" of the two matrices, you could do:
>>> (df1 * df2.rename(columns=dict(zip(df2.columns, df1.columns)))).sum().sum()
96
Solution 4:[4]
This is a pretty old question, and as nnsk said, pd.np is being deprecated.
A nice looking solution is df1 * df2.values. This will produce the element-wise product of the two dataframes, and keep the column names of df1.
Solution 5:[5]
Another solution assuming indexes and columns are well positioned :
df_mul= pd.DataFrame(df1.values * df2.values, columns= df1.columns, index= df1.index)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BearPy |
| Solution 2 | nnsk |
| Solution 3 | patricksurry |
| Solution 4 | confusedkingbread |
| Solution 5 | Wenceslas Sanchez |
