'How to write a pandas dataframe to_json() to s3 in json format

I have an AWS lambda function which creates a data frame, I need to write this file to a S3 bucket.

import pandas as pd
import boto3
import io

# code to get the df

destination = "output_" + str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) + '.json'

df.to_json(destination) # this file should be written to S3 bucket



Solution 1:[1]

You can use following code as well

#Creating Session using Boto3

session = boto3.Session(
aws_access_key_id='<key ID>',
aws_secret_access_key='<secret_key>'
)
 
#Create s3 session with boto3

s3 = session.resource('s3')
 
json_buffer = io.StringIO()
 
# Create dataframe and convert to pandas
df = spark.range(4).withColumn("organisation", lit("stackoverflow"))
df_p = df.toPandas()
df_p.to_json(json_buffer, orient='records')
 
#Create s3 object
object = s3.Object('<bucket-name>', '<JSON file name>')
 
#Put the object into bucket
result = object.put(Body=json_buffer.getvalue())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aman Sehgal