'Trouble reading Blob Storage File into Azure ML Notebook

I have an Excel file uploaded to my ML workspace.

I can access the file as an azure FileDataset object. However, I don't know how to get it into a pandas DataFrame since 'FileDataset' object has no attribute 'to_dataframe'.

Azure ML notebooks seem to make a point of avoiding pandas for some reason.

Does anyone know how to get blob files into pandas dataframes from within Azure ML notebooks?



Solution 1:[1]

To explore and manipulate a dataset, it must first be downloaded from the blob source to a local file, which can then be loaded in a pandas DataFrame.

Here are the steps to follow for this procedure:

  1. Download the data from Azure blob with the following Python code sample using Blob service. Replace the variable in the following code with your specific values:

    from azure.storage.blob import BlobServiceClient
    import pandas as pd
    
    STORAGEACCOUNTURL= <storage_account_url>
    STORAGEACCOUNTKEY= <storage_account_key>
    LOCALFILENAME= <local_file_name>
    CONTAINERNAME= <container_name>
    BLOBNAME= <blob_name>
    
    #download from blob
    t1=time.time()
    blob_service_client_instance = 
    BlobServiceClient(account_url=STORAGEACCOUNTURL, 
    credential=STORAGEACCOUNTKEY)
    blob_client_instance = 
    blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME, 
    snapshot=None)
    with open(LOCALFILENAME, "wb") as my_blob:
    blob_data = blob_client_instance.download_blob()
    blob_data.readinto(my_blob)
    t2=time.time()
    print(("It takes %s seconds to download "+BLOBNAME) % (t2 - t1))
    
  2. Read the data into a pandas DataFrame from the downloaded file.

    #LOCALFILE is the file path
    dataframe_blobdata = pd.read_csv(LOCALFILENAME)
    

For more details you can follow this link

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AbhishekKhandave-MT