'How to convert duration formatted as mm.ss to ss in pandas/python?

I have a column of Call Duration formatted as mm.ss and I would like to convert it to all seconds.

It looks like this:

CallDuration
25         29.02
183         5.40
213         3.02
290        10.27
304         2.00
           ...  
4649990    13.02
4650067     5.33
4650192    19.47
4650197     3.44
4650204    14.15

In excel I would separate the column at the ".", multiply the minutes column by 60 and then add it to the seconds column for my total seconds. I feel like this should be much easier with pandas/python, but I cannot figure it out.

I tried using pd.to_timedelta but that did not give me what I need - I can't figure out how to put in there how the time is formatted. When I put in 'm' it does not return correctly with seconds being after the "."

pd.to_timedelta(post_group['CallDuration'],'m')
25        0 days 00:29:01.200000
183              0 days 00:05:24
213       0 days 00:03:01.200000
290       0 days 00:10:16.200000
304              0 days 00:02:00
                   ...          
4649990   0 days 00:13:01.200000
4650067   0 days 00:05:19.800000
4650192   0 days 00:19:28.200000
4650197   0 days 00:03:26.400000
4650204          0 days 00:14:09
Name: CallDuration, Length: 52394, dtype: timedelta64[ns]

Tried doing it this way, but now can't get the 'sec' column to convert to an integer because there are blanks, and it won't fill the blanks...


post_duration = post_group['CallDuration'].str.split(".",expand=True)
post_duration.columns = ["min","sec"]
post_duration['min'] = post_duration['min'].astype(int)
post_duration['min'] = 60*post_duration['min']
post_duration.loc['Total', 'min'] = post_duration['min'].sum()
post_duration

    min    sec
25  1740.0  02
183 300.0   4
213 180.0   02
290 600.0   27
304 120.0   None
... ... ...
4650067 300.0   33
4650192 1140.0  47
4650197 180.0   44
4650204 840.0   15
Total   24902700.0  NaN


post_duration2 = post_group['CallDuration'].str.split(".",expand=True)
post_duration2.columns = ["min","sec"]
post_duration2['sec'].astype(float).astype('Int64')
post_duration2.fillna(0)
post_duration2.loc['Total', 'sec'] = post_duration2['sec'].sum()
post_duration2

TypeError: object cannot be converted to an IntegerDtype


Solution 1:[1]

Perhaps there's a more efficient way, but I would still convert to a timedelta format then use apply with the Timedelta.total_seconds() method to get the column in seconds.

import pandas as pd
pd.to_timedelta(post_group['CallDuration'], 'm').apply(pd.Timedelta.total_seconds) 

You can find more info on attributes and methods you can call on timedeltas here

Solution 2:[2]

import pandas as pd
import numpy as np
import datetime


def convert_to_seconds(col_data):
    col_data = pd.to_datetime(col_data, format="%M:%S")
    
    # The above line adds the 1900-01-01 as a date to the time, so using subtraction to remove it
    col_data = col_data - datetime.datetime(1900,1,1)
    
    return col_data.dt.total_seconds()


df = pd.DataFrame({'CallDuration':['2:02', 
                                   '5:50',
                                   np.nan,
                                   '3:02']})

df['CallDuration'] = convert_to_seconds(df['CallDuration'])

Here's the result:

   CallDuration
0         122.0
1         350.0
2           NaN
3         182.0

You can also use the above code to convert string HH:MM to total seconds in float but only if the number of hours are less than 24.

And if you want to convert multiple columns in your dataframe replace

df['CallDuration'] = convert_to_seconds(df['CallDuration']) 

with

new_df = df.apply(lambda col: convert_to_seconds(col) if col.name in colnames_list else col)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stefan
Solution 2 sansh