'Pyspark best practices
We are newly implementing pyspark in our company, which would be the best practice of handling code. Out Architecture is a follows
S3 -->EMR (Pyspark)--> SNowflake . Data flow is orchestrated through airflow. Infra orchestration through Terraform.
Currently, I am storing terraform code through Ci/CD on github . So when Data is pushed to pyspark in dev s3 it gets copied to the prod aws account. My Question is do I need to store pyspark on GitHub or not ? What is the best practice to do this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
