'Pyarrow Unable to Recognize Map Data Type

There is some data that was stored in parquet file format that I want to read in using Dask. Unfortunately, it's not able to interpret the map data type. Is there a way to read in this data without relying on Spark? I am using pyarrow==6.0.1

Example:

import dask.dataframe as dd
df = dd.read_parquet("s3://data/part=0", engine='pyarrow')
df.compute()

Error:

ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, double>


Solution 1:[1]

Not sure if this will work on your case (having a reproducible snippet could help), but a basic delayed wrapper might help, something like this:

@delayed
def custom_load(file_path):
    # xx could be pandas, pyarrow or something else that opens the file without a problem
    df = xx.open_file(file_path) 
    ...
    return df

df = dd.from_delayed([custom_load(f) for f in list_files])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 SultanOrazbayev