'Get subfolders csv files with names starting with string

I have the following tree structure in my databricks project:

/root
     Year1
       |_ Week 1
            |_ a_1312.csv
            |_ a_1312.csv
            |_ b_1312.csv
            |_ c_1312.csv
       |_ Week 2
       |_ Week 3
       |_ ...
     Year2
       |_ ...

I am trying to get all the csv files that start with a_. I am trying to use recursiveFileLookup with a wildcard but it is not working.

spark.read \
      .option("recursiveFileLookup", "true") \
      .option('inferSchema', True)\
      .option('header', True) \
      .option('delimiter', ',') \
      .csv(mount_point  + "[a_*].csv") 

I get the following error:

Path does not exist: dbfs:/mnt/xxxx/**/[a_*].csv

It looks like the wildcard is not working and it is being processed as part of the string

Any idea on what am I doing wrong?



Solution 1:[1]

Your path should be something like this

/root/*/*/a_*.csv
      ^ ^
   Year Month

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pltc