'send data from s3 to postres rds with glue

I'm trying to create an automated pipeline with aws. I'm able to get my csv file into my s3 bucket and that automatically triggers a lambda function to send the csv to my glue job. The glue job then turns the csv into a dataframe with pyspark. you cannot use psycopg2, pandas or sqlalchemy, or else glue will give an error saying the module doesn't exist. I have a postgres rds setup in aws rds. This is what i have so far

import sys
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import SparkSession
from setuptools import setup
from sqlalchemy import create_engine
spark = SparkSession.builder.getOrCreate()
args = getResolvedOptions(sys.argv, ["VAL1", "VAL2"])
file_name = args['VAL1']
bucket_name = args["VAL2"]
file_path = "s3a://{}/{}".format(bucket_name, file_name)
df = spark.read.csv(file_path, sep=',', inferSchema=True, header=True)
df.drop("index")
url = "my rds endpoint link"

i have tried almost a dozen solutions before asking on stackoverflow. So any help would be amazing

python aws-glue

Solution 1:^[1]

I used this df.write approach before. Starting where you left off with your pyspark dataframe

jdbc_url = 'jdbc:postgresql://<instance_name>.xxxxxxxxx.us-west-2.rds.amazonaws.com:5432/<db_name>' 

(df.write.format('jdbc').option('url', 'jdbc_url') 
? ? ? ? ? ? ? ? ? ? ? ? .option('user', 'myUsername') 
? ? ? ? ? ? ? ? ? ? ? ? .option('password', 'myPassword') 
? ? ? ? ? ? ? ? ? ? ? ? .option('dbtable', 'myTable') 
? ? ? ? ? ? ? ? ? ? ? ? .option('driver', 'org.postgresql.Driver') 
? ? ? ? ? ? ? ? ? ? ? ? ?.mode('append').save())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Bob Haffner

'send data from s3 to postres rds with glue

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]