'Process to interact with blob storage files from Databricks notebooks

Within a Azure Databricks notebook, I am attempting to perform a transformation on some csv's which are in blob storage using the following:

*import os
    import glob
    import pandas as pd
    os.chdir(r'wasbs://dalefactorystorage.blob.core.windows.net/dale')
    allFiles = glob.glob("*.csv") # match your csvs
    for file in allFiles:
       df = pd.read_csv(file)
       df = df.iloc[4:,] # read from row 4 onwards.
       df.to_csv(file)
       print(f"{file} has removed rows 0-3")*

Unfortunately I am getting the following error:

*FileNotFoundError: [Errno 2] No such file or directory: 'wasbs://dalefactorystorage.blob.core.windows.net/dale'

Am I missing something? (I am completely new to this).

Cheers,

Dale



Solution 1:[1]

A, alternative approach is to mount the dbfs file as a spark dataframe, and then just convert it from a sparkdf to a pandas df:

# mount blob storage
spark.conf.set("fs.azure.account.key.storageaccountname.blob.core.windows.net",
"storageaccesskey")

dfspark = spark.read.csv("wasbs://[email protected]
/filename.csv", header="true")

# convert from sparkdf to azuredf 
df = dfspark.toPandas()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BrokenBenchmark