'Pandas Correlation Matrix of Eucledian Distance

Given a Data Frame, which has 8000000 rows has index, and 4 Numerical Features, and Also a Distance Function:

def dist(row1,row2):
   return (row1.x1-row2.x2 + row1.xn-row2.xn)/n
  • Expected Output is a 8000000 * 800000 Correlation Matrix , find of represention between, each row.

  • OR . Given DICT of Row ID, and a set of neighbours, distance where Negibour = dist<threshold

One way is obvious that a nest loop , one inner and one outer can iterate. But I am looking for a Greedy Solution, Data Size is HUGE.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source