'pyarrow ds.dataset fails with FileNotFoundError using Azure blob filesystem in azure functions but not locally

I have a couple functions in Azure Functions which download data into my data lake v2 storage. They work by downloading CSVs, converting them to pyarrow and then saving to individual parquet files on Azure's storage which works fine.

I have another function in the same app which is supposed to consolidate and repartition the data. That function results in a FileNotFoundError exception when trying to create a dataset.

import pyarrow as pa
import pyarrow.dataset as ds
import pyarrow.parquet as pq
import pyarrow.compute as pc
import os
from adlfs import AzureBlobFileSystem
abfs=AzureBlobFileSystem(connection_string=os.environ['Synblob'])

newds=ds.dataset(f"stdatalake/miso/dailyfiles/{filetype}", filesystem=abfs, partitioning="hive")

When running locally, the code runs fine but when running on the cloud I get....

Result: Failure Exception: FileNotFoundError: stdatalake/miso/dailyfiles/daprices/exante/date=20210926 
Stack: File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 402, in _handle__invocation_request call_result = await self._loop.run_in_executor( File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) 
File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 611, in _run_sync_func return ExtensionManager.get_sync_invocation_wrapper(context, File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/extension.py",
 line 215, in _raw_invocation_wrapper result = function(**args) File "/home/site/wwwroot/consolidate/__init__.py", 
 line 47, in main newds=ds.dataset(f"stdatalake/miso/dailyfiles/{filetype}", filesystem=abfs, partitioning="hive") 
 File "/home/site/wwwroot/.python_packages/lib/site-packages/pyarrow/dataset.py", line 667, in dataset return _filesystem_dataset(source, **kwargs) File "/home/site/wwwroot/.python_packages/lib/site-packages/pyarrow/dataset.py", 
 line 422, in _filesystem_dataset return factory.finish(schema) File "pyarrow/_dataset.pyx", line 1680, 
 in pyarrow._dataset.DatasetFactory.finish File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/_fs.pyx", line 1179, 
 in pyarrow._fs._cb_open_input_file File "/home/site/wwwroot/.python_packages/lib/site-packages/pyarrow/fs.py", line 394, in open_input_file raise FileNotFoundError(path)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source