'how to enter manually a Python dataframe with daily dates in a correct format
I would like to (manually) create in Python a dataframe with daily dates (in column 'date') as per below code. But the code does not provide the correct format for the daily dates, neglects dates (the desired format representation is below). Could you please advise how I can correct the code so that the 'date' column is entered in a desired format? Thanks in advance!
------------------------------------------------------
desired format for date column
2021-03-22 3
2021-04-07 3
2021-04-18 3
2021-05-12 0
------------------------------------------------------
df1 = pd.DataFrame({"date": [2021-3-22, 2021-4-7, 2021-4-18, 2021-5-12],
"x": [3, 3, 3, 0 ]})
df1
date x
0 1996 3
1 2010 3
2 1999 3
3 2004 0
Solution 1:[1]
Python wants to interpret the numbers in the sequence 2021-3-22 as a series of mathematical operations 2021 minus 3 minus 22.
If you want that item to be stored as a string that resembles a date you will need to mark them as string literal datatype (str), as shown below by encapsulating them with quotes.
import pandas as pd
df1 = pd.DataFrame({"date": ['2021-3-22', '2021-4-7', '2021-4-18', '2021-5-12'],
"x": [3, 3, 3, 0 ]})
The results for the date column, as shown here indicate that the date column contains elements of the object datatype which encompasses str in pandas. Notice that the strings were created exactly as shown (2021-3-22 instead of 2021-03-22).
0 2021-3-22
1 2021-4-7
2 2021-4-18
3 2021-5-12
Name: date, dtype: object
IF however, you actually want them stored as datetime objects so that you can do datetime manipulations on them (i.e. determine the number of days between two dates OR filter by a specific month OR year) then you need to convert the values to datetime objects.
This technique will do that:
df1['date'] = pd.to_datetime(df1['date'])
The results of this conversion are Pandas datetime objects which enable nanosecond precision (I differentiate this from Python datetime objects which are limited to microsecond precision).
0 2021-03-22
1 2021-04-07
2 2021-04-18
3 2021-05-12
Name: date, dtype: datetime64[ns]
Notice the displayed results are now formatted just as you would expect of datetimes (2021-03-22 instead of 2021-3-22).
Solution 2:[2]
You would want to create the series as a datetime and use the following codes when doing so as strings, more info here pandas.to_datetime:
df1 = pd.DataFrame({"date": pd.to_datetime(["2021-3-22", "2021-4-7", "2021-4-18", "2021-5-12"]),
"x": [3, 3, 3, 0 ]})
Solution 3:[3]
FWIW, I often use pd.read_csv(io.StringIO(text)) to copy/paste tabular-looking data into a DataFrame (for example, from SO questions).
Example:
import io
import re
import pandas as pd
def df_read(txt, **kwargs):
txt = '\n'.join([s.strip() for s in txt.splitlines()])
return pd.read_csv(io.StringIO(re.sub(r' +', '\t', txt)), sep='\t', **kwargs)
txt = """
date value
2021-03-22 3
2021-04-07 3
2021-04-18 3
2021-05-12 0
"""
df = df_read(txt, parse_dates=['date'])
>>> df
date value
0 2021-03-22 3
1 2021-04-07 3
2 2021-04-18 3
3 2021-05-12 0
>>> df.dtypes
date datetime64[ns]
value int64
dtype: object
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | tylerjames |
| Solution 3 | Pierre D |
