'Trouble reading Blob Storage File into Azure ML Notebook
I have an Excel file uploaded to my ML workspace.
I can access the file as an azure FileDataset object. However, I don't know how to get it into a pandas DataFrame since 'FileDataset' object has no attribute 'to_dataframe'.
Azure ML notebooks seem to make a point of avoiding pandas for some reason.
Does anyone know how to get blob files into pandas dataframes from within Azure ML notebooks?
Solution 1:[1]
To explore and manipulate a dataset, it must first be downloaded from the blob source to a local file, which can then be loaded in a pandas DataFrame.
Here are the steps to follow for this procedure:
Download the data from Azure blob with the following Python code sample using Blob service. Replace the variable in the following code with your specific values:
from azure.storage.blob import BlobServiceClient import pandas as pd STORAGEACCOUNTURL= <storage_account_url> STORAGEACCOUNTKEY= <storage_account_key> LOCALFILENAME= <local_file_name> CONTAINERNAME= <container_name> BLOBNAME= <blob_name> #download from blob t1=time.time() blob_service_client_instance = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY) blob_client_instance = blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME, snapshot=None) with open(LOCALFILENAME, "wb") as my_blob: blob_data = blob_client_instance.download_blob() blob_data.readinto(my_blob) t2=time.time() print(("It takes %s seconds to download "+BLOBNAME) % (t2 - t1))Read the data into a pandas DataFrame from the downloaded file.
#LOCALFILE is the file path dataframe_blobdata = pd.read_csv(LOCALFILENAME)
For more details you can follow this link
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AbhishekKhandave-MT |
