'How to write a pandas dataframe to_json() to s3 in json format
I have an AWS lambda function which creates a data frame, I need to write this file to a S3 bucket.
import pandas as pd
import boto3
import io
# code to get the df
destination = "output_" + str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) + '.json'
df.to_json(destination) # this file should be written to S3 bucket
Solution 1:[1]
You can use following code as well
#Creating Session using Boto3
session = boto3.Session(
aws_access_key_id='<key ID>',
aws_secret_access_key='<secret_key>'
)
#Create s3 session with boto3
s3 = session.resource('s3')
json_buffer = io.StringIO()
# Create dataframe and convert to pandas
df = spark.range(4).withColumn("organisation", lit("stackoverflow"))
df_p = df.toPandas()
df_p.to_json(json_buffer, orient='records')
#Create s3 object
object = s3.Object('<bucket-name>', '<JSON file name>')
#Put the object into bucket
result = object.put(Body=json_buffer.getvalue())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aman Sehgal |
