'How to turn time from string form to something pandas can recognize as time?

The Data

1 hour 29 mins

46 secs

1 min 47 secs

2 mins 19 secs

6 days 18 hours ...

How to I turn data like these in string form to something pandas can recognise? I've been thinking of something like regular expressions but it seems a bit too farfetched. Would appreciate if you could help. Stay safe.



Solution 1:[1]

Regex is actually a good candidate to solve this. Using your test dataset and slightly generalizing for months and years like so...

df = pd.DataFrame(
    columns=["raw_time"],
    data=[
        "1 hour 29 mins",
        "46 secs",
        "1 min 47 secs",
        "2 mins 19 secs",
        "6 days 18 hours",
        "2 years 2 months 3 hours",
        "1 year 1 month 1 day 1 hours 1 min 1 sec",
        "3 years 4 months 2 days 7 hours 38 mins 42 secs",
    ],
)

...we can use the snippet below to parse each string and converts it to seconds. From there it should be easy to convert to any time object you need.

# Watch out accuracy of this constant
N_SECONDS = {
    "years": 12 * 30 * 24 * 3600,
    "months": 30 * 24 * 3600,
    "days": 24 * 3600,
    "hours": 3600,
    "minutes": 60,
    "seconds": 1,
}

pattern = (
    r"((?P<years>\d+)(\syear[s]?))? ?((?P<months>\d+)(\smonth[s]?))? "
    r"?((?P<days>\d+)(\sday[s]?))? ?((?P<hours>\d+)(\shour[s]?))? "
    r"?((?P<minutes>\d+)(\smin[s]?))? ?((?P<seconds>\d+)(\ssec[s]?))?"
)

def parse_string_to_seconds(time_str: str) -> int:
    match = re.match(pattern, time_str)
    if not match:
        return None
    times_match = {k: int(v) if v else 0 for k, v in match.groupdict().items()}
    return sum(times_match[k] * N_SECONDS[k] for k in N_SECONDS)

df["time_seconds"] = df["raw_time"].apply(parse_string_to_seconds)

df
>>>                                           raw_time  time_seconds
>>> 0                                   1 hour 29 mins          5340
>>> 1                                          46 secs            46
>>> 2                                    1 min 47 secs           107
>>> 3                                   2 mins 19 secs           139
>>> 4                                  6 days 18 hours        583200
>>> 5                         2 years 2 months 3 hours      67402800
>>> 6         1 year 1 month 1 day 1 hours 1 min 1 sec      33786061
>>> 7  3 years 4 months 2 days 7 hours 38 mins 42 secs     103880322

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 swimmer