'Process to interact with blob storage files from Databricks notebooks
Within a Azure Databricks notebook, I am attempting to perform a transformation on some csv's which are in blob storage using the following:
*import os
import glob
import pandas as pd
os.chdir(r'wasbs://dalefactorystorage.blob.core.windows.net/dale')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
df = pd.read_csv(file)
df = df.iloc[4:,] # read from row 4 onwards.
df.to_csv(file)
print(f"{file} has removed rows 0-3")*
Unfortunately I am getting the following error:
*FileNotFoundError: [Errno 2] No such file or directory: 'wasbs://dalefactorystorage.blob.core.windows.net/dale'
Am I missing something? (I am completely new to this).
Cheers,
Dale
Solution 1:[1]
A, alternative approach is to mount the dbfs file as a spark dataframe, and then just convert it from a sparkdf to a pandas df:
# mount blob storage
spark.conf.set("fs.azure.account.key.storageaccountname.blob.core.windows.net",
"storageaccesskey")
dfspark = spark.read.csv("wasbs://[email protected]
/filename.csv", header="true")
# convert from sparkdf to azuredf
df = dfspark.toPandas()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BrokenBenchmark |
