'how to drop duplicated columns data based on column name in pandas
Assume I have a table like below
A B C B
0 0 1 2 3
1 4 5 6 7
I'd like to drop column B. I tried to use drop_duplicates, but it seems that it only works based on duplicated data not header.
Hope anyone know how to do this.
Solution 1:[1]
Use Index.duplicated with loc or iloc and boolean indexing:
print (~df.columns.duplicated())
[ True True True False]
df = df.loc[:, ~df.columns.duplicated()]
print (df)
A B C
0 0 1 2
1 4 5 6
df = df.iloc[:, ~df.columns.duplicated()]
print (df)
A B C
0 0 1 2
1 4 5 6
Timings:
np.random.seed(123)
cols = ['A','B','C','B']
#[1000 rows x 30 columns]
df = pd.DataFrame(np.random.randint(10, size=(1000,30)),columns = np.random.choice(cols, 30))
print (df)
In [115]: %timeit (df.groupby(level=0, axis=1).first())
1000 loops, best of 3: 1.48 ms per loop
In [116]: %timeit (df.groupby(level=0, axis=1).mean())
1000 loops, best of 3: 1.58 ms per loop
In [117]: %timeit (df.iloc[:, ~df.columns.duplicated()])
1000 loops, best of 3: 338 µs per loop
In [118]: %timeit (df.loc[:, ~df.columns.duplicated()])
1000 loops, best of 3: 346 µs per loop
Solution 2:[2]
You can groupby
We use the axis=1 and level=0 parameters to specify that we are grouping by columns. Then use the first method to grab the first column within each group defined by unique column names.
df.groupby(level=0, axis=1).first()
A B C
0 0 1 2
1 4 5 6
We could have also used last
df.groupby(level=0, axis=1).last()
A B C
0 0 3 2
1 4 7 6
Or mean
df.groupby(level=0, axis=1).mean()
A B C
0 0 2 2
1 4 6 6
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Anton vBR |
| Solution 2 |


