'Python: Multiplying a dataframe with another dataframe
Hi I currently have 2 dataframe with different shapes
df11 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
a b c
0 1 2 3
1 4 5 6
2 7 8 9
df12 = pd.DataFrame(np.array([[7, 8, 9]]),
columns=['a', 'b', 'c'])
a b c
0 7 8 9
I would like to multiply each row in df11 by df12. So the resulting dataframe should show
df13 = pd.DataFrame(np.array([[7, 16, 27], [28, 40, 54], [49, 64, 81]]),
columns=['a', 'b', 'c'])
a b c
0 7 16 27
1 28 40 54
2 49 64 81
Solution 1:[1]
I recommend using numpy multiplication
df13 = pd.DataFrame(df11.to_numpy()*df12.to_numpy(), columns=df11.columns)
Or you can use pandas mul operator like this,
df11.mul({'a': 7, 'b': 8, 'c': 9})
Solution 2:[2]
One-liner
df_3 = df_1 * df_2.iloc[0]
Code
import pandas as pd
data_1 = {'a': [1, 4, 7],
'b': [2, 5, 8],
'c': [3, 6, 9]}
data_2 = {'a': [7], 'b': [8], 'c': [9]}
df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)
df_3 = df_1 * df_2.iloc[0]
print(df_3)
Output
a b c
0 7 16 27
1 28 40 54
2 49 64 81
Timings A few timings for this input.
# Paul_O's numpy approach
25.9 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# iloc approach
172 µs ± 962 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# mozway's approach
194 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# Paul_O's mul approach
308 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Making data_1 a 10000 x 3 DataFrame of random integers between 1 and 10000 we get very similar results.
# Paul_O's numpy approach
39 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# iloc approach
188 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# mozway's approach
206 µs ± 2.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Paul_O's mul approach
312 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Of course, these are only two sets of timings for two very specific sets of input on one system so I would not advise on generating hard conclusions from these but it seems if your problem is very similar to this one then the numpy approach is best. The best way may differ in other circumstances, e.g., if the form of your input differs.
Solution 3:[3]
You can use squeeze:
df13 = df11*df12.squeeze()
The potential advantage is that it would perform a 2D multiplication if df12 has more than 2 rows.
output:
a b c
0 7 16 27
1 28 40 54
2 49 64 81
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | farch |
| Solution 2 | |
| Solution 3 |
