'Python - Intra similarity

I'm trying to code in Python the intra similarity on the Iris data set. Which is the distance between elements from the same class. For example on that set:

 1  2  3  4  |0
 5  6  7  8  |0 
 1  3  5  6  |1
11 12 13 14  |0 
10  2  4  6  |1

distance1 = (1-5)^2 + (2-6)^2 + (3 - 7)^2 + (4-8)^2
distance1 = sqrt(distance1)
distance2 = (1- 11)^2 + (2-12)^2 + (3 - 13)^2 + (4-14)^2
distance2 = sqrt(distance2)
similarityClass0 = (ditance1 + distance2) / 2

And then I will have to do the same for class 1, 2 , 3 and so on.

For now My code is I think functionnal but pretty ugly
In input I have X and y. When I finish to compute for tab0, I do the same for tab1, tab2 etc.

My question is: How can I create a code for n classes? My goal is also to have for each line a measure of intra similarity

from sklearn import datasets
import numpy as np


iris = datasets.load_iris()

iris.data.shape, iris.target.shape

X = iris.data
#0 = Setosa // 1 = Versicolor // 2 = Virginica
y = iris.target

#At first, we retrieve the indexes of each classes
#For example if tab0 has classes on ligne 1,2,6. Tab0 will store 1,2,6
tab0 = list() 
tab1 = list() 
tab2 = list() 
j = 0

for output in y:
    if output == 0 :
        tab0.append(j)
    if output == 1 :
        tab1.append(j)
    if output == 2 :
        tab2.append(j)
    j = j + 1

########################################################################
#Computation intra similarity#
import math

sim0_intra = list()
sim1_intra = list()
sim2_intra = list()

#Classes stores 1,2,3 ( the 3classes), count the number of elements in each classes
classes, count = np.unique(y, return_counts=True)

temp = 0

for i in tab0:
    temp = 0
    for j in tab0:
        for k in range(len(X[0])):
            temp = temp + np.square(X[i][k] - X[j][k])

    sim0_intra.append(np.sqrt(temp / ( count[0] - 1)) )


Solution 1:[1]

The answer from Guiem is indeed straightforward. I have just to add this line after you calculated the distance matrix

    D_triu = np.triu(D)
    intra_class_dist = np.mean(D_triu[D_triu > 0])

since np.triu() creates an upper triangular matrix with elements below the k-th diagonal zeroed

https://numpy.org/doc/stable/reference/generated/numpy.triu.html

hence, you need to discard them from the calculation of the mean.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marko Panic