'Pyspark best practices

We are newly implementing pyspark in our company, which would be the best practice of handling code. Out Architecture is a follows

S3 -->EMR (Pyspark)--> SNowflake . Data flow is orchestrated through airflow. Infra orchestration through Terraform.

Currently, I am storing terraform code through Ci/CD on github . So when Data is pushed to pyspark in dev s3 it gets copied to the prod aws account. My Question is do I need to store pyspark on GitHub or not ? What is the best practice to do this?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Pyspark best practices

Sources

Related Questions