'Trying to get the minimum date and getting TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'
i'm reading from an excel file
GA = pd.read_excel("file.xlsx", sheet_name=0, engine= "openpyxl")
The data type is:
- Email object
- Date datetime64[ns]
- Name object
I want to get only the row with the first date of an email
For example:
- [email protected] 1/1/2022 a
- [email protected] 2/1/2022 b
- [email protected] 3/1/2022 c
I'm trying to get only
- [email protected] 1/1/2022 a
- [email protected] 3/1/2022 c
I tried GA.groupby('email')['date'].min()
But I'm getting the TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'
i tried to change the date type to an object, tried to add reset_index(), tried to use agg('min) instead of min(), tried GA.sort_values('date').groupby('email').tail(1)
but keep getting this error, please help
Solution 1:[1]
I believe your solution was only missing df['date'] = pd.to_datetime(df['date']) for it to work, so:
import pandas as pd
import numpy as np
data = {'email': ['[email protected]', '[email protected]', '[email protected]'],
'date': ['01/01/2022', '02/01/2022', '03/01/2022'],
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.groupby('email')['date'].min()
Output is:
email
[email protected] 2022-01-01
[email protected] 2022-03-01
Name: date, dtype: datetime64[ns]
Solution 2:[2]
The problem was, that the email had integer, not the date thank you for your time
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pedro de Sá |
| Solution 2 | Shany H. |
