'Python - Intra similarity
I'm trying to code in Python the intra similarity on the Iris data set. Which is the distance between elements from the same class. For example on that set:
1 2 3 4 |0
5 6 7 8 |0
1 3 5 6 |1
11 12 13 14 |0
10 2 4 6 |1
distance1 = (1-5)^2 + (2-6)^2 + (3 - 7)^2 + (4-8)^2
distance1 = sqrt(distance1)
distance2 = (1- 11)^2 + (2-12)^2 + (3 - 13)^2 + (4-14)^2
distance2 = sqrt(distance2)
similarityClass0 = (ditance1 + distance2) / 2
And then I will have to do the same for class 1, 2 , 3 and so on.
For now My code is I think functionnal but pretty ugly
In input I have X and y. When I finish to compute for tab0, I do the same
for tab1, tab2 etc.
My question is: How can I create a code for n classes? My goal is also to have for each line a measure of intra similarity
from sklearn import datasets
import numpy as np
iris = datasets.load_iris()
iris.data.shape, iris.target.shape
X = iris.data
#0 = Setosa // 1 = Versicolor // 2 = Virginica
y = iris.target
#At first, we retrieve the indexes of each classes
#For example if tab0 has classes on ligne 1,2,6. Tab0 will store 1,2,6
tab0 = list()
tab1 = list()
tab2 = list()
j = 0
for output in y:
if output == 0 :
tab0.append(j)
if output == 1 :
tab1.append(j)
if output == 2 :
tab2.append(j)
j = j + 1
########################################################################
#Computation intra similarity#
import math
sim0_intra = list()
sim1_intra = list()
sim2_intra = list()
#Classes stores 1,2,3 ( the 3classes), count the number of elements in each classes
classes, count = np.unique(y, return_counts=True)
temp = 0
for i in tab0:
temp = 0
for j in tab0:
for k in range(len(X[0])):
temp = temp + np.square(X[i][k] - X[j][k])
sim0_intra.append(np.sqrt(temp / ( count[0] - 1)) )
Solution 1:[1]
The answer from Guiem is indeed straightforward. I have just to add this line after you calculated the distance matrix
D_triu = np.triu(D)
intra_class_dist = np.mean(D_triu[D_triu > 0])
since np.triu() creates an upper triangular matrix with elements below the k-th diagonal zeroed
https://numpy.org/doc/stable/reference/generated/numpy.triu.html
hence, you need to discard them from the calculation of the mean.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Marko Panic |
