'AWS Glue: AttributeError: 'DataFrame' object has no attribute 'to_datetime' (custom transform)

I am pretty new to AWS Glue, I am using Glue studio to create a Job that consist of reading from S3 and then remove duplicates.

In regular python/pandas my script will look like below to solve this issue:

import pandas as pd
import datetime
account_json = [{"Id":"0016F00002YLLTBQA5","name":"Burlington Business Corp of America","partition_0":"3da86b76-38fb-3de2-90d0-e27ee7146ffe-2022-02-03T05:47:41"},{"Id":"0016F00002YLLTKQA5","name":"GenePoint Sydney","partition_0":"3da86b76-38fb-3de2-90d0-e27ee7146ffe-2022-02-03T05:47:41"},{"Id":"0016F00002YLLTLQA5","name":"sForce manzano","partition_0":"3da86b76-38fb-3de2-90d0-e27ee7146ffe-2022-02-03T05:47:41"},{"Id":"0016F00002YLLTLQA5","name":"sForce mania","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTJQA5","name":"United Manzano Oil & Gas, Singapore","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTBQA5","name":"Burlington Corp of America","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTAQA5","name":"Edge Communications","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTIQA5","name":"United Oil & Gas, UK","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTKQA5","name":"GenePoint","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTGQA5","name":"Express Logistics and Transport","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTDQA5","name":"Dickenson Hey hey plc","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTEQA5","name":"Grand Hotels & Resorts Ltd","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTCQA5","name":"Pyramid Construction Inc.","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTFQA5","name":"United Oil & Gas Corp.","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTHQA5","name":"University of Arizona","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"}]

df = pd.DataFrame (account_json)
df['submissionTime'] = pd.to_datetime(df.partition_0.str[-19:])
df.sort_values('submissionTime').drop_duplicates('Id',keep='last')

In glue studio, I created node "Custom Transform" and this is my code:

def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
    import datetime
    df = dfc.select(list(dfc.keys())[0]).toDF()
    df['submissionTime'] = df.to_datetime(df.partition_0.str[-19:])
    df.sort_values('submissionTime').drop_duplicates('Id',keep='last')
    return(DynamicFrameCollection({"CustomTransform0": dyf_filtered}, glueContext))     

But when i try to run this I get the following error

AttributeError: 'DataFrame' object has no attribute 'to_datetime'

What will I be missing?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source