'Remove duplicate data based on the same unix time

multiple data on the same date. I am trying to remove the multiple data and have the data aligned based on the unix time given, I tried using remove duplicate but its not working

            time    x     y
0  1648598400000  233  6758
1  1648598400000  234  6758
2  1648598403000  553  8678
3  1648598404000  987  8778
4  1648598405000  732  4535
5  1648598406000  234  7656
6  1648598406000  234  8977
7  1648598406000  465  7656
8  1648598406000  465  8977


Solution 1:[1]

df[ ~df['time'].duplicated() ] (with ~) works for me.

I use io only to simulate file - so everyone can copy it.

data = '''            time    x     y
0  1648598400000  233  6758
1  1648598400000  234  6758
2  1648598403000  553  8678
3  1648598404000  987  8778
4  1648598405000  732  4535
5  1648598406000  234  7656
6  1648598406000  234  8977
7  1648598406000  465  7656
8  1648598406000  465  8977
'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(data), sep='\s+')

print('\n--- before ---\n')
print(df)

print('\n--- after ---\n')
print( df[ ~df['time'].duplicated() ] )

Result:

--- before ---

            time    x     y
0  1648598400000  233  6758
1  1648598400000  234  6758
2  1648598403000  553  8678
3  1648598404000  987  8778
4  1648598405000  732  4535
5  1648598406000  234  7656
6  1648598406000  234  8977
7  1648598406000  465  7656
8  1648598406000  465  8977

--- after ---

            time    x     y
0  1648598400000  233  6758
2  1648598403000  553  8678
3  1648598404000  987  8778
4  1648598405000  732  4535
5  1648598406000  234  7656

If I use duplicated(keep='last') then it gives


--- after ---

            time    x     y
1  1648598400000  234  6758
2  1648598403000  553  8678
3  1648598404000  987  8778
4  1648598405000  732  4535
8  1648598406000  465  8977

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 furas