'Databricks python/pyspark code to find the age of the blob in azure container
Looking for databricks python/pyspark code to copy azure blob from one container to another container older than 30 days
Solution 1:[1]
The copy code is simple as follows.
dbutils.fs.cp("/mnt/xxx/file_A", "/mnt/yyy/file_A", True)The difficult part is checking blob modification time. According to the doc, the modification time will only get returned by using
dbutils.fs.lscommand on Databricks Runtime 10.2 or above. You may check the Runtime version using the command below.spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion")The returned value will be Databricks Runtime followed by Scala versions.
If you get lucky with the version, you can can do something like:import time ts_now = time.time() for file in dbutils.fs.ls('/mnt/xxx'): if ts_now - file.modificationTime > 30 * 86400: dbutils.fs.cp(f'/mnt/xxx/{file.name}', f'/mnt/yyy/{file.name}', True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | PhuriChal |
