'Pandas Correlation Matrix of Eucledian Distance
Given a Data Frame, which has 8000000 rows has index, and 4 Numerical Features, and Also a Distance Function:
def dist(row1,row2):
return (row1.x1-row2.x2 + row1.xn-row2.xn)/n
Expected Output is a 8000000 * 800000 Correlation Matrix , find of represention between, each row.
OR . Given DICT of Row ID, and a set of neighbours, distance where Negibour = dist<threshold
One way is obvious that a nest loop , one inner and one outer can iterate. But I am looking for a Greedy Solution, Data Size is HUGE.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
