'Automate pulling json files from S3 and pushing the same to pyspark for ETL
There will be log files dropped into S3 in some interval time, i want to automate the picking up of new files from S3 and push the same in my pyspark ETL code. Can we watch the S3 using spark streaming, how to do that with python?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
