'Remove duplicate data based on the same unix time
multiple data on the same date. I am trying to remove the multiple data and have the data aligned based on the unix time given, I tried using remove duplicate but its not working
time x y
0 1648598400000 233 6758
1 1648598400000 234 6758
2 1648598403000 553 8678
3 1648598404000 987 8778
4 1648598405000 732 4535
5 1648598406000 234 7656
6 1648598406000 234 8977
7 1648598406000 465 7656
8 1648598406000 465 8977
Solution 1:[1]
df[ ~df['time'].duplicated() ] (with ~) works for me.
I use io only to simulate file - so everyone can copy it.
data = ''' time x y
0 1648598400000 233 6758
1 1648598400000 234 6758
2 1648598403000 553 8678
3 1648598404000 987 8778
4 1648598405000 732 4535
5 1648598406000 234 7656
6 1648598406000 234 8977
7 1648598406000 465 7656
8 1648598406000 465 8977
'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(data), sep='\s+')
print('\n--- before ---\n')
print(df)
print('\n--- after ---\n')
print( df[ ~df['time'].duplicated() ] )
Result:
--- before ---
time x y
0 1648598400000 233 6758
1 1648598400000 234 6758
2 1648598403000 553 8678
3 1648598404000 987 8778
4 1648598405000 732 4535
5 1648598406000 234 7656
6 1648598406000 234 8977
7 1648598406000 465 7656
8 1648598406000 465 8977
--- after ---
time x y
0 1648598400000 233 6758
2 1648598403000 553 8678
3 1648598404000 987 8778
4 1648598405000 732 4535
5 1648598406000 234 7656
If I use duplicated(keep='last') then it gives
--- after ---
time x y
1 1648598400000 234 6758
2 1648598403000 553 8678
3 1648598404000 987 8778
4 1648598405000 732 4535
8 1648598406000 465 8977
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | furas |
