'Read parquet file using pd.read_parquet looking for a schema

I'm working on an app that is writing parquet files. For testing purposes, I'm trying to read a generated file with pd.read_parquet. I get a really strange error that asks for a schema:

self = <[AttributeError("'ParquetFile' object has no attribute '_schema'") raised in repr()] ParquetFile object at 0x7fae6e06b250>

This happen on the following line:

data = pd.read_parquet(file)

where file is the path to file from root content. First I'm not supposed to provide a schema as we're talking about parquet here and I'm not sure what could cause the issue. Maybe a readability clause ?

The generated file looks good when I imported it in my Parquet plugin for pycharm

{"Id": 12345, "Limit": 200, "Product": 818} {"Id": 67890, "Limit":3000, "Product": 819} So it shouldn't be an issue with the input data.

NB: Tried the same with fastparquet and got the same error (makes sense as pd.read_parquer is based on it.



Solution 1:[1]

Same thing happened to me while I was doing it with a compression schema of

df.to_parquet("sample.parquet",compression="uncompressed")

I changed it to none. Then it started working.

df.to_parquet("sample.parquet",compression="none")

Maybe for your case environment is not setup correctly. Try installing other engines such as fastparquet or pyarrow.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tblaze