'Matrix from dataframe in R/Python
I have a dataframe like this: If the person with that id has that fruit the value is 1, else it is 0. ID column is a primary key.
| ID | Apple | Orange | Pear | Grapes |
|---|---|---|---|---|
| E1 | 1 | 0 | 1 | 1 |
| E2 | 0 | 0 | 1 | 0 |
| E3 | 0 | 1 | 1 | 0 |
| E4 | 1 | 1 | 0 | 0 |
| E5 | 1 | 0 | 0 | 1 |
I want a output as a matrix like this. This is a count of people who has both i and j fruit. Where i and j both are same, eg: cell apple x apple, we are looking at the count of all the people who has at least an apple apple. Here, 3 people have apple, so the value is 3. Similarly, we have only one person who has both apple and orange(E4) so the count is 1 in both (orange x apple) and (apple x orange) cells. We have 2 people in the above dataframe who has both grapes and apple(E1 and E5), so value in that cell is 2.
Edits: I want output like this
| Apple | Orange | Pear | Grapes | |
|---|---|---|---|---|
| Apple | 3 | 1 | 1 | 2 |
| Orange | 1 | 2 | 1 | 0 |
| Pear | 1 | 1 | 3 | 1 |
| Grapes | 2 | 0 | 1 | 2 |
I am new to R and Python, and not sure how to achieve this. Any help in either R or Python would be really appreciated! Please feel free to ask questions if anything is not clear or anything is missing. Thanks!
Solution 1:[1]
Here is the example with Python3 and itertools.permutations:
import itertools
import pandas as pd
# Create original dataframe
columns = ['ID', 'Apple', 'Orange', 'Pear', 'Grapes']
rows = [
['E1', 1, 0, 1, 1],
['E2', 0, 0, 1, 0],
['E3', 0, 1, 1, 0],
['E4', 1, 1, 0, 0],
['E5', 1, 0, 0, 1],
]
df = pd.DataFrame(rows, columns=columns)
# Count values of resulting matrix
def get_ones_indexes(els):
for i, el in enumerate(els):
if el == 1:
yield i
res_n = len(df.columns) - 1
res = [[0] * res_n for _ in range(res_n)]
for _, row in df.drop('ID', axis=1).iterrows():
indexes = list(get_ones_indexes(row.to_list()))
if len(indexes) == 1:
idx = indexes[0]
res[idx][idx] += 1
else:
for i, j in itertools.permutations(indexes, 2):
res[i][j] += 1
# Convert resultinng matrix to dataframe
_, *fruit_cols = df.columns
res_df = pd.DataFrame(res, index=fruit_cols, columns=fruit_cols)
print(res_df)
# Apple Orange Pear Grapes
# Apple 0 1 1 2
# Orange 1 0 1 0
# Pear 1 1 1 1
# Grapes 2 0 1 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alex Kosh |
