'Get subfolders csv files with names starting with string
I have the following tree structure in my databricks project:
/root
Year1
|_ Week 1
|_ a_1312.csv
|_ a_1312.csv
|_ b_1312.csv
|_ c_1312.csv
|_ Week 2
|_ Week 3
|_ ...
Year2
|_ ...
I am trying to get all the csv files that start with a_. I am trying to use recursiveFileLookup with a wildcard but it is not working.
spark.read \
.option("recursiveFileLookup", "true") \
.option('inferSchema', True)\
.option('header', True) \
.option('delimiter', ',') \
.csv(mount_point + "[a_*].csv")
I get the following error:
Path does not exist: dbfs:/mnt/xxxx/**/[a_*].csv
It looks like the wildcard is not working and it is being processed as part of the string
Any idea on what am I doing wrong?
Solution 1:[1]
Your path should be something like this
/root/*/*/a_*.csv
^ ^
Year Month
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pltc |
