'how to the select the most informative columns in a dataframe?

I want to select the most representation columns in a dataFrame. I read somewhere that: " The eigenvectors represent the principal components that contain most of the information (variance) represented using features (independent variables)." So, I do like that:

import numpy as np
from numpy.linalg import eigh
cov_matrix = np.cov(d, rowvar=False)
#
# Determine eigenvalues and eigenvectors
#
egnvalues, egnvectors = eigh(cov_matrix)
#
# Determine explained variance
#
total_egnvalues = sum(egnvalues)

var_exp = [(i/total_egnvalues) for i in sorted(egnvalues, reverse=True)]
print(total_egnvalues)
print(var_exp)

output:

0.11556921475095604
[0.3839056474139497, 0.22463482813643307, 0.1159777232847966, ...]

So, please correct me (according to these results which columns represent the top2 or top3 columns to consider?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'how to the select the most informative columns in a dataframe?

Sources

Related Questions