'Trying to get the minimum date and getting TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'

i'm reading from an excel file

GA = pd.read_excel("file.xlsx", sheet_name=0, engine= "openpyxl")

The data type is:

  • Email object
  • Date datetime64[ns]
  • Name object

I want to get only the row with the first date of an email

For example:

I'm trying to get only

I tried GA.groupby('email')['date'].min()

But I'm getting the TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'

i tried to change the date type to an object, tried to add reset_index(), tried to use agg('min) instead of min(), tried GA.sort_values('date').groupby('email').tail(1) but keep getting this error, please help



Solution 1:[1]

I believe your solution was only missing df['date'] = pd.to_datetime(df['date']) for it to work, so:

import pandas as pd
import numpy as np
data = {'email':  ['[email protected]', '[email protected]', '[email protected]'],
        'date': ['01/01/2022', '02/01/2022', '03/01/2022'],
        }
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.groupby('email')['date'].min()

Output is:

email
[email protected]   2022-01-01
[email protected]   2022-03-01
Name: date, dtype: datetime64[ns]

Solution 2:[2]

The problem was, that the email had integer, not the date thank you for your time

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pedro de Sá
Solution 2 Shany H.