'Reading parquet files from different folders inside a azure storage container in Pyspark
I need to read parquet files from multiple directories inside a azure storage container..
for example,
Container1
folder1
parquet1
parquet2
..
folder2
parquet5
parquet6
...
I know these folders are virtual in blob but how do i read all these parquet and load into one dataframe.
paths=["wasbs://{container}@{storageAccountName}.blob.core.windows.net/pathToFile1","wasbs://{container}@{storageAccountName}.blob.core.windows.net/pathToFile2"]
df=spark.read.parquet(*paths)
will the above code work?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
