'How to write a method to check independence which returns a dictionary of length 3

I having some difficulty to try to understand the question and I am not very sure how to get a method to returns a dictionary of length 3.

This is the sample table:

X Y pr
0 1 0.30
0 2 0.25
1 1 0.15
1 2 0.30

There are total 3 elements needed:

• first element (the key named are_independent) is a boolean which states if X and Y are independent (True) or not (False). Two random variables are independent if for each possible value x for X and for each possible value y for Y. -- already have the solution for this (attached below)

• second element (the key named cov) is a covariance between X and Y (i is an indicator of i-th of n possible pairs (xi, yi) of (X, Y))

• third element (the key named corr) is a correlation coefficient between X and Y

I have some idea on the 1st element and the rest, I am really not very sure about it.

import pandas as pd
import numpy as np

# you can use this table as an example
distr_table = pd.DataFrame({
    'X': [0, 0, 1, 1],
    'Y': [1, 2, 1, 2],
    'pr': [0.3, 0.25, 0.15, 0.3]
})


class CheckIndependence:

    def __init__(self):
        self.version = 1

    def check_independence(self, distr_table: pd.DataFrame):
        # write your solution here
        distr_table.groupby('Y')['pr'].sum()
        distr_table.groupby('X')['pr'].sum()

        cmp = pd.merge(distr_table.groupby('X', as_index=False)['pr'].sum(), distr_table.groupby('Y', as_index=False)['pr'].sum(), how='cross')
        cmp['indep_pr'] = cmp['pr_x'] * cmp['pr_y']
        cmp[['X', 'Y', 'indep_pr']].merge(distr_table, on=['X', 'Y'])
        np.allclose(cmp['indep_pr'], distr_table['pr'])


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source