'Pandas convert datetime64 [ns] columns to datetime64 [ns, UTC] for mutliple column at once

I have a dataframe called query_df and some of the columns are in datetime[ns] datatype.

I want to convert all datetime[ns] to datetime[ns, UTC] all at once.

This is what I've done so far by retrieving columns that are datetime[ns]:

dt_columns = [col for col in query_df.columns if query_df[col].dtype == 'datetime64[ns]']

To convert it, I can use pd.to_datetime(query_df["column_name"], utc=True).

Using dt_columns, I want to convert all columns in dt_columns. How can I do it all at once?

Attempt:

query_df[dt_columns] = pd.to_datetime(query_df[dt_columns], utc=True)

Error:

ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing



Solution 1:[1]

You have to use lambda function to achieve this. Try doing this

df[dt_columns] = df[dt_columns].apply(pd.to_datetime, utc=True)

Solution 2:[2]

First part of the process is already done by you i.e. grouping the names of the columns whose datatype is to be converted , by using :

dt_columns = [col for col in query_df.columns if query_df[col].dtype == 'datetime64[ns]']

Now , all you have to do ,is to convert all the columns to datetime all at once using pandas apply() functionality :

query_df[dt_columns] = query_df[dt_columns].apply(pd.to_datetime)

This will convert the required columns to the data type you specify.

EDIT:

Without using the lambda function

step 1: Create a dictionary with column names (columns to be changed) and their datatype :

convert_dict = {}

Step 2: Iterate over column names which you extracted and store in the dictionary as key with their respective value as datetime :

for col in dt_columns:
    convert_dict[col] = datetime

Step 3: Now convert the datatypes by passing the dictionary into the astype() function like this :

query_df = query_df.astype(convert_dict)

By doing this, all the values of keys will be applied to the columns matching the keys.

Solution 3:[3]

Your attempt query_df[dt_columns] = pd.to_datetime(query_df[dt_columns], utc=True) is interpreting dt_columns as year, month, day. Below the example in the help of to_datetime():

Assembling a datetime from multiple columns of a DataFrame. The keys can be
common abbreviations like ['year', 'month', 'day', 'minute', 'second',
'ms', 'us', 'ns']) or plurals of the same

>>> df = pd.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

Below a code snippet that gives you a solution with a little example. Bear in mind that depending in your data format or your application the UTC might not give your the right date.

import pandas as pd 
 
query_df = pd.DataFrame({"ts1":[1622098447.2419431, 1622098447], "ts2":[1622098427.370945,1622098427], "a":[1,2], "b":[0.0,0.1]}) 
query_df.info() 
 
# convert to datetime in nano seconds 
query_df[["ts1","ts2"]] = query_df[["ts1","ts2"]].astype("datetime64[ns]") 
query_df.info() 
 
#convert to datetime with UTC 
query_df[["ts1","ts2"]] = query_df[["ts1","ts2"]].astype("datetime64[ns, UTC]") 
query_df.info()

which outputs:

   <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2 entries, 0 to 1
    Data columns (total 4 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   ts1     2 non-null      float64
     1   ts2     2 non-null      float64
     2   a       2 non-null      int64  
     3   b       2 non-null      float64
    dtypes: float64(3), int64(1)
    memory usage: 192.0 bytes
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2 entries, 0 to 1
    Data columns (total 4 columns):
     #   Column  Non-Null Count  Dtype         
    ---  ------  --------------  -----         
     0   ts1     2 non-null      datetime64[ns]
     1   ts2     2 non-null      datetime64[ns]
     2   a       2 non-null      int64         
     3   b       2 non-null      float64       
    dtypes: datetime64[ns](2), float64(1), int64(1)
    memory usage: 192.0 bytes
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2 entries, 0 to 1
    Data columns (total 4 columns):
     #   Column  Non-Null Count  Dtype              
    ---  ------  --------------  -----              
     0   ts1     2 non-null      datetime64[ns, UTC]
     1   ts2     2 non-null      datetime64[ns, UTC]
     2   a       2 non-null      int64              
     3   b       2 non-null      float64            
    dtypes: datetime64[ns, UTC](2), float64(1), int64(1)
    memory usage: 192.0 byte

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 eshirvana
Solution 2
Solution 3