'How do I find common rows and columns in multiple (more than 2) dataframes using pandas?
I want to find the common rows and columns in all three dataframes: mrna, meth, and cna. I concatenated the three tables by the Hugo_Symbol. I'm expecting 343 common columns but am getting 347 common columns.
import pandas as pd
from functools import reduce
dfs = [mrna, meth, cna]
common = pd.concat(dfs, join='inner', ignore_index=True)
common = common.drop_duplicates()
common = common.loc[:, ~common.columns.duplicated()]
len(set(common.columns))
Current output:
347
Expected output:
343
Solution 1:[1]
I think you need for common columns names use set.intersection - ordering should be changed (because in sets is not defined order):
out = set.intersection(*map(set,[x.columns for x in dfs]))
Or if ordering is important use reduce with numpy.intersect1d:
out = reduce(np.intersect1d, [x.columns for x in dfs])
In your solution common columns should be:
common = pd.concat(dfs, join='inner')
out = common.columns.tolist()
out1 = common.index.tolist()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
