'Pyspark renaming and moving files in S3
I have to rename and move the output of my AWS Glue job to another folder in S3. I followed one of the reply from this post.
For the line below, I tried to put in a subfolder after folder_name hoping it will be created.
hadoopFs.rename(SparkContext._jvm.org.apache.hadoop.fs.Path(f"s3://bucket_name/folder_name/{file_name}"), SparkContext._jvm.org.apache.hadoop.fs.Path("s3://bucket_name/folder_name/myFile.parquet"))
It did not create the subfolder and the file will only be renamed and moved if it's already existing. I'm trying to create the subfolder with name as date format YYYYMMDD.
I have this code but it's not working.
currentdate = datetime.now().strftime("%Y%m%d")
hadoopFs.rename(SparkContext._jvm.org.apache.hadoop.fs.Path(f"s3://bucket_name/folder_name/{file_name}"), SparkContext._jvm.org.apache.hadoop.fs.Path("s3://bucket_name/folder_name/"+currentdate+"/myFile.parquet"))
Is there any way to achieve this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
