'how to write a arrow table to parquet file
I read a parquet file as a table, so I can parse the parquet file and get the data. When I want to write the table to a local parquet file. The size of parquet file is very large. In theory, the size should be the same. When I check the binary file, found the new file is different original parquet file. It's larger because it contain many duplicate column names. How can I compress it.
void read_write(std::string file_read, std::string file_write)
{
std::shared_ptr<arrow::io::ReadableFile> infile;
PARQUET_ASSIGN_OR_THROW(infile,
arrow::io::ReadableFile::Open(file_read,
arrow::default_memory_pool()));
std::unique_ptr<parquet::arrow::FileReader> reader;
parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader);
std::shared_ptr<arrow::Table> table;
reader->ReadTable(&table);
std::shared_ptr<arrow::io::FileOutputStream> outfile;
PARQUET_ASSIGN_OR_THROW(
outfile, arrow::io::FileOutputStream::Open(file_write));
PARQUET_THROW_NOT_OK(
parquet::arrow::WriteTable(*table, arrow::default_memory_pool(), outfile, 3));
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
