'How to find the Entropy of each column of data-set by Python?

I have dataset quantized it to 10 levels by Python and looks like:

9 9 1 8 9 1

1 9 3 6 1 0

8 3 8 4 4 1

0 2 1 9 9 0

This means the component (9 9 1 8 9) belongs to class 1. I want to find the Entropy of each feature(column). I wrote the following code but it has many errors:

import pandas as pd
import math

f = open ( 'data1.txt' , 'r')

# Finding the probability
df = pd.DataFrame(pd.read_csv(f, sep='\t', header=None, names=['val1', 
    'val2', 'val3', 'val4','val5', 'val6', 'val7', 'val8']))
df.loc[:,"val1":"val5"] = df.loc[:,"val1":"val5"].div(df.sum(axis=0), 
    axis=1)

# Calculating Entropy
def shannon(col):
    entropy = - sum([ p * math.log(p) / math.log(2.0) for p in col])
    return entropy

sh_df = df.loc[:,'val1':'val5'].apply(shannon,axis=0)

Can you correct my code or do you know any function for finding the Entropy of each column of a dataset in Python?



Solution 1:[1]

You can find column's entropy in pandas with the following script

import numpy as np
from math import e
import pandas as pd   

""" Usage: pandas_entropy(df['column1']) """

def pandas_entropy(column, base=None):
    vc = pd.Series(column).value_counts(normalize=True, sort=False)
    base = e if base is None else base
    return -(vc * np.log(vc)/np.log(base)).sum()

Just run the previous function for each column and it will return each entropy.

This answer was inspired by this one

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 marianoju