'Pyarrow Unable to Recognize Map Data Type
There is some data that was stored in parquet file format that I want to read in using Dask. Unfortunately, it's not able to interpret the map data type. Is there a way to read in this data without relying on Spark? I am using pyarrow==6.0.1
Example:
import dask.dataframe as dd
df = dd.read_parquet("s3://data/part=0", engine='pyarrow')
df.compute()
Error:
ArrowNotImplementedError: Not implemented type for Arrow list to pandas: map<string, double>
Solution 1:[1]
Not sure if this will work on your case (having a reproducible snippet could help), but a basic delayed wrapper might help, something like this:
@delayed
def custom_load(file_path):
# xx could be pandas, pyarrow or something else that opens the file without a problem
df = xx.open_file(file_path)
...
return df
df = dd.from_delayed([custom_load(f) for f in list_files])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SultanOrazbayev |
