'How to get list of all leaf folders from ADLS Gen2 path via Scala code?
We have folders and subfolders in it with year,month, day folders in it. How can we get only the last leaf level folder list using dbutils.fs.ls utility?
Example path:
abfss://[email protected]/customer/data/V1/2021/ abfss://[email protected]/customer/data/V1/2022/ abfss://[email protected]/customer/data/V1/2022/03/24/15/a.parquet abfss://[email protected]/customer/data/V1/2022/03/25/15/b.parquet . .
The function should return only last leaf level folder list i.e
abfss://[email protected]/customer/data/V1/2022/03/24/15 abfss://[email protected]/customer/data/V1/2022/03/25/15
EDIT:
I have tried below function and it works but it fails when some folder is empty with error "java.lang.UnsupportedOperationException: empty.reduceLeft". Please help.
def listLeafDirectories(path: String): Array[String] =
dbutils.fs.ls(path).map(file => {
// Work around double encoding bug
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) listLeafDirectories(path)
else Array[String](path.substring(0,path.lastIndexOf("/")+1))
}).reduce(_ ++ _).distinct
Solution 1:[1]
Below function worked for me
def listDirectories(dir: String, recurse: Boolean): Array[String] = {
dbutils.fs.ls(dir).map(file => {
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) listDirectories(path,recurse)
else Array[String](path.substring(0, path.lastIndexOf("/")+1))
}).reduceOption(_ union _).getOrElse(Array()).distinct
}
Solution 2:[2]
The function should return only last leaf level folder list
You can use fold function of scala.
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
Folds the elements of this list using the specified associative binary operator. The default implementation in IterableOnce is equivalent to foldLeft but may be overridden for more efficient traversal orders.
The order in which operations are performed on elements is unspecified and may be nondeterministic.
returns the result of applying the fold operator op between all the elements and z, or z if this list is empty.
For more information refer this link
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | SomeGuy |
Solution 2 | AbhishekKhandave-MT |