'Faster way to perform a function on each row with every other row in a DataFrame?
I want to perform an operation of each row with every other row in a dataframe. The obvious way is to use nested for loops and that is expectedly very slow.
Seeking suggestions on faster and better way to achieve the same thing?
This is dataframe where each row is a user vector, with index set as usernames. In actual there can be hundreds of usernames
import pandas as pd
df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])
Nested Loop Method
import numpy as np
def some_func(u1_vec,u2_vec):
# this could be any function using above 2 user vectors
return np.minimum(u1_vec, u2_vec).sum()/np.maximum(u1_vec, u2_vec).sum()
index_list = list(df1.index) # contains usernames
vector_cols = list(df1.columns) # contains colnames
min_max_all = {} # will be used to store the vector interaction
for index_u1 in index_list:
u1_vec = df1.loc[index_u1, vector_cols]
min_max_all[index_u1] = {}
for index_u2 in index_list:
u2_vec = df1.loc[index_u2, vector_cols]
min_max_all[index_u1][index_u2] = some_func(u1_vec, u2_vec)
Result - min_max_all
{
'U1': {'U1': 1.0, 'U2': 0.5416666666666666, 'U3': 0.5384615384615384},
'U2': {'U1': 0.5416666666666666, 'U2': 1.0, 'U3': 0.8333333333333334},
'U3': {'U1': 0.5384615384615384, 'U2': 0.8333333333333334, 'U3': 1.0}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
