'pd.read_excel parses dates automatically and parses it wrong

In pd.read_excel pandas automatically parses the columns names as date. And parses it wrong. The date is dd/mm/yy and it parses it as mm/dd/yy.

The column names are date.

code used

df = pd.read_excel('check.xlsx')
print(df)

The df printed has dates parsed in wrong format

Here's the excel file https://docs.google.com/spreadsheets/d/1rgl0Je5EyxpBunk7FWPHcpZxXFdUZUni/edit?usp=drivesdk&ouid=109057655084381529864&rtpof=true&sd=true . The column names are in dd/mm/Y format.



Solution 1:[1]

Use '%Y-%m-%d' for formatting like you wish.

e.g.

import pandas as pd

df = pd.DataFrame({"Date": ["26-12-2007", "27-12-2007", "28-12-2007"]})
df["Date"] = pd.to_datetime(df["Date"]).dt.strftime('%Y-%m-%d')
print(df)

Output:

     Date
0  2007-12-26
1  2007-12-27
2  2007-12-28

You can also set the column labels to equal the values in the first row with e.g.

df.columns = df.iloc[0]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1