'AWS Glue: AttributeError: 'DataFrame' object has no attribute 'to_datetime' (custom transform)
I am pretty new to AWS Glue, I am using Glue studio to create a Job that consist of reading from S3 and then remove duplicates.
In regular python/pandas my script will look like below to solve this issue:
import pandas as pd
import datetime
account_json = [{"Id":"0016F00002YLLTBQA5","name":"Burlington Business Corp of America","partition_0":"3da86b76-38fb-3de2-90d0-e27ee7146ffe-2022-02-03T05:47:41"},{"Id":"0016F00002YLLTKQA5","name":"GenePoint Sydney","partition_0":"3da86b76-38fb-3de2-90d0-e27ee7146ffe-2022-02-03T05:47:41"},{"Id":"0016F00002YLLTLQA5","name":"sForce manzano","partition_0":"3da86b76-38fb-3de2-90d0-e27ee7146ffe-2022-02-03T05:47:41"},{"Id":"0016F00002YLLTLQA5","name":"sForce mania","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTJQA5","name":"United Manzano Oil & Gas, Singapore","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTBQA5","name":"Burlington Corp of America","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTAQA5","name":"Edge Communications","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTIQA5","name":"United Oil & Gas, UK","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTKQA5","name":"GenePoint","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTGQA5","name":"Express Logistics and Transport","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTDQA5","name":"Dickenson Hey hey plc","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTEQA5","name":"Grand Hotels & Resorts Ltd","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTCQA5","name":"Pyramid Construction Inc.","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTFQA5","name":"United Oil & Gas Corp.","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"},{"Id":"0016F00002YLLTHQA5","name":"University of Arizona","partition_0":"c3d6ada0-2927-4472-8190-8becde39416c-2022-02-03T05:15:00"}]
df = pd.DataFrame (account_json)
df['submissionTime'] = pd.to_datetime(df.partition_0.str[-19:])
df.sort_values('submissionTime').drop_duplicates('Id',keep='last')
In glue studio, I created node "Custom Transform" and this is my code:
def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
import datetime
df = dfc.select(list(dfc.keys())[0]).toDF()
df['submissionTime'] = df.to_datetime(df.partition_0.str[-19:])
df.sort_values('submissionTime').drop_duplicates('Id',keep='last')
return(DynamicFrameCollection({"CustomTransform0": dyf_filtered}, glueContext))
But when i try to run this I get the following error
AttributeError: 'DataFrame' object has no attribute 'to_datetime'
What will I be missing?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
