'Aws S3 to Databricks mount is not working

I have mounted 'mybucket' using mount commands and i could able to list all the objects using the below command-

%fs
ls /mnt/mybucket/

however, i have folders inside the folders in 'mybucket' and i want to run the below command but it is not working.

%fs
ls /mnt/mybucket/*/*/

Any help is much appreciated. Thanks



Solution 1:[1]

The dbutils.fs.ls and it's magic variant %fs ls don't support wildcards, so you need to iterate over the files yourself, with something like this:

def list_files(path, max_level = 1, cur_level=0):
  d = dbutils.fs.ls(path)
  for i in d:
    if i.name.endswith("/") and i.size == 0 and cur_level < (max_level - 1):
      yield from list_files(i.path, max_level, cur_level+1)
    else:
      yield i.path

files = list_files("/mnt/mybucket", 1)

Solution 2:[2]

If you attempt to create a mount point within an existing mount point, for example:

Mount one storage account to /mnt/storage1

Mount a second storage account to /mnt/storage1/storage2

This will fail because nested mounts are not supported in Databricks. recommended one is creating separate mount entries for each storage object.

For example:

Mount one storage account to /mnt/storage1

Mount a second storage account to /mnt/storage2

Solution 3:[3]

Unmount and mount again.

dbutils.fs.unmount("/mnt/mount_name")

dbutils.fs.mount("s3a://%s" % aws_bucket_name, "/mnt/%s" % mount_name)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott
Solution 2 Karthikeyan Rasipalay Durairaj
Solution 3 Victor Kironde