'Error converting pandas dataframe to ORC using pyarrow

I'm trying to save a Pandas DataFrame as .orc file using Pyarrow. The versions of packages are: pandas==1.3.5 and pyarrow==6.0.1. My python3 version is 3.9.12.

Here is the code snippet:

import pandas as pd
import pyarrow as pa
import pyarrow.orc as orc

df = pd.read_orc('sample.orc')
table = pa.Table.from_pandas(df, preserve_index=False)
orc.write_table(table, 'sample_rewritten.orc')

The error I'm getting is: ArrowNotImplementedError: Unknown or unsupported Arrow type: null

How do I save a Pandas DataFrame (csv) as .orc file in python?

The write_table line is failing. This is the entire stack trace:

ArrowNotImplementedError                  Traceback (most recent call last)
Input In [1], in <cell line: 7>()
      5 df = pd.read_orc('hats_v2_sample.orc')
      6 table = pa.Table.from_pandas(df, preserve_index=False)
----> 7 orc.write_table(table, 'sample_rewritten.orc')

File /opt/homebrew/lib/python3.9/site-packages/pyarrow/orc.py:176, in write_table(table, where)
    174     table, where = where, table
    175 writer = ORCWriter(where)
--> 176 writer.write(table)
    177 writer.close()

File /opt/homebrew/lib/python3.9/site-packages/pyarrow/orc.py:146, in ORCWriter.write(self, table)
    136 def write(self, table):
    137     """
    138     Write the table into an ORC file. The schema of the table must
    139     be equal to the schema used when opening the ORC file.
   (...)
    144         The table to be written into the ORC file
    145     """
--> 146     self.writer.write(table)

File /opt/homebrew/lib/python3.9/site-packages/pyarrow/_orc.pyx:159, in pyarrow._orc.ORCWriter.write()

File /opt/homebrew/lib/python3.9/site-packages/pyarrow/error.pxi:120, in pyarrow.lib.check_status()

ArrowNotImplementedError: Unknown or unsupported Arrow type: null


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source