'Aws S3 to Databricks mount is not working
I have mounted 'mybucket' using mount commands and i could able to list all the objects using the below command-
%fs
ls /mnt/mybucket/
however, i have folders inside the folders in 'mybucket' and i want to run the below command but it is not working.
%fs
ls /mnt/mybucket/*/*/
Any help is much appreciated. Thanks
Solution 1:[1]
The dbutils.fs.ls and it's magic variant %fs ls don't support wildcards, so you need to iterate over the files yourself, with something like this:
def list_files(path, max_level = 1, cur_level=0):
d = dbutils.fs.ls(path)
for i in d:
if i.name.endswith("/") and i.size == 0 and cur_level < (max_level - 1):
yield from list_files(i.path, max_level, cur_level+1)
else:
yield i.path
files = list_files("/mnt/mybucket", 1)
Solution 2:[2]
If you attempt to create a mount point within an existing mount point, for example:
Mount one storage account to /mnt/storage1
Mount a second storage account to /mnt/storage1/storage2
This will fail because nested mounts are not supported in Databricks. recommended one is creating separate mount entries for each storage object.
For example:
Mount one storage account to /mnt/storage1
Mount a second storage account to /mnt/storage2
Solution 3:[3]
Unmount and mount again.
dbutils.fs.unmount("/mnt/mount_name")
dbutils.fs.mount("s3a://%s" % aws_bucket_name, "/mnt/%s" % mount_name)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alex Ott |
| Solution 2 | Karthikeyan Rasipalay Durairaj |
| Solution 3 | Victor Kironde |
