'How to convert duration formatted as mm.ss to ss in pandas/python?
I have a column of Call Duration formatted as mm.ss and I would like to convert it to all seconds.
It looks like this:
CallDuration
25 29.02
183 5.40
213 3.02
290 10.27
304 2.00
...
4649990 13.02
4650067 5.33
4650192 19.47
4650197 3.44
4650204 14.15
In excel I would separate the column at the ".", multiply the minutes column by 60 and then add it to the seconds column for my total seconds. I feel like this should be much easier with pandas/python, but I cannot figure it out.
I tried using pd.to_timedelta but that did not give me what I need - I can't figure out how to put in there how the time is formatted. When I put in 'm' it does not return correctly with seconds being after the "."
pd.to_timedelta(post_group['CallDuration'],'m')
25 0 days 00:29:01.200000
183 0 days 00:05:24
213 0 days 00:03:01.200000
290 0 days 00:10:16.200000
304 0 days 00:02:00
...
4649990 0 days 00:13:01.200000
4650067 0 days 00:05:19.800000
4650192 0 days 00:19:28.200000
4650197 0 days 00:03:26.400000
4650204 0 days 00:14:09
Name: CallDuration, Length: 52394, dtype: timedelta64[ns]
Tried doing it this way, but now can't get the 'sec' column to convert to an integer because there are blanks, and it won't fill the blanks...
post_duration = post_group['CallDuration'].str.split(".",expand=True)
post_duration.columns = ["min","sec"]
post_duration['min'] = post_duration['min'].astype(int)
post_duration['min'] = 60*post_duration['min']
post_duration.loc['Total', 'min'] = post_duration['min'].sum()
post_duration
min sec
25 1740.0 02
183 300.0 4
213 180.0 02
290 600.0 27
304 120.0 None
... ... ...
4650067 300.0 33
4650192 1140.0 47
4650197 180.0 44
4650204 840.0 15
Total 24902700.0 NaN
post_duration2 = post_group['CallDuration'].str.split(".",expand=True)
post_duration2.columns = ["min","sec"]
post_duration2['sec'].astype(float).astype('Int64')
post_duration2.fillna(0)
post_duration2.loc['Total', 'sec'] = post_duration2['sec'].sum()
post_duration2
TypeError: object cannot be converted to an IntegerDtype
Solution 1:[1]
Perhaps there's a more efficient way, but I would still convert to a timedelta format then use apply with the Timedelta.total_seconds() method to get the column in seconds.
import pandas as pd
pd.to_timedelta(post_group['CallDuration'], 'm').apply(pd.Timedelta.total_seconds)
You can find more info on attributes and methods you can call on timedeltas here
Solution 2:[2]
import pandas as pd
import numpy as np
import datetime
def convert_to_seconds(col_data):
col_data = pd.to_datetime(col_data, format="%M:%S")
# The above line adds the 1900-01-01 as a date to the time, so using subtraction to remove it
col_data = col_data - datetime.datetime(1900,1,1)
return col_data.dt.total_seconds()
df = pd.DataFrame({'CallDuration':['2:02',
'5:50',
np.nan,
'3:02']})
df['CallDuration'] = convert_to_seconds(df['CallDuration'])
Here's the result:
CallDuration
0 122.0
1 350.0
2 NaN
3 182.0
You can also use the above code to convert string HH:MM to total seconds in float but only if the number of hours are less than 24.
And if you want to convert multiple columns in your dataframe replace
df['CallDuration'] = convert_to_seconds(df['CallDuration'])
with
new_df = df.apply(lambda col: convert_to_seconds(col) if col.name in colnames_list else col)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stefan |
| Solution 2 | sansh |
