''DataFrame' object has no attribute 'schema'

I'm trying to write the data in the existing zip file to hdfs in parquet format, but I encountered an error like this. I would be glad if you help. (By the way, I'm open to your ideas to make this code serve the same purpose in a different way)

import pandas as pd
import pyarrow.parquet as pq

file = c:/okay.log.gz
df = pd.read_csv(file, compression =gzip, low_memory=false, sep="|", error_badlines=False)
pq.write_table(df, "target_path")

AttributeError: 'DataFrame' object has no attribute 'schema'



Solution 1:[1]

I've just run into the same issue, but I assume you've resolved yours. In case you haven't or someone else comes across this with a similar issue, try creating a pyarrow table from the dataframe first.

import pyarrow as pa
import pyarrow.parquet as pq
    
df = {some dataframe}
table = pa.Table.from_pandas(df)
pq.write_table(table, '{path}')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dshay