'Trouble reading Blob Storage File into Azure ML Notebook

I have an Excel file uploaded to my ML workspace.

I can access the file as an azure FileDataset object. However, I don't know how to get it into a pandas DataFrame since 'FileDataset' object has no attribute 'to_dataframe'.

Azure ML notebooks seem to make a point of avoiding pandas for some reason.

Does anyone know how to get blob files into pandas dataframes from within Azure ML notebooks?

Solution 1:^[1]

To explore and manipulate a dataset, it must first be downloaded from the blob source to a local file, which can then be loaded in a pandas DataFrame.

Here are the steps to follow for this procedure:

Download the data from Azure blob with the following Python code sample using Blob service. Replace the variable in the following code with your specific values:

from azure.storage.blob import BlobServiceClient
import pandas as pd

STORAGEACCOUNTURL= <storage_account_url>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

#download from blob
t1=time.time()
blob_service_client_instance = 
BlobServiceClient(account_url=STORAGEACCOUNTURL, 
credential=STORAGEACCOUNTKEY)
blob_client_instance = 
blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME, 
snapshot=None)
with open(LOCALFILENAME, "wb") as my_blob:
blob_data = blob_client_instance.download_blob()
blob_data.readinto(my_blob)
t2=time.time()
print(("It takes %s seconds to download "+BLOBNAME) % (t2 - t1))

Read the data into a pandas DataFrame from the downloaded file.

#LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)

For more details you can follow this link

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	AbhishekKhandave-MT

'Trouble reading Blob Storage File into Azure ML Notebook

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]