'C or Macos from parquet to ascii streamer? [closed]
I am looking for a space-efficient [and ideally fast] way to stream terabytes of parquet data (record by record) into my C programs on a Mac. I do not want to read full tables into my programs --- I do not have TB of RAM. (besides, even Mac Studios are not only hard to purchase, they also don't have that much RAM.)
my first stop was, of course, brew:
$ brew install parquet-tools
...installs ok...
$ parquet-tools cat *parquet | head
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Mac and os.arch=aarch64
I do know the length of my output records (peeked with R). They are very simple and short lines (almost like a linux syslog) and mostly ASCII (not int's etc).
All I want to do in C is
for (int fi=0; fi<FI; ++fi) {
FILE *f= parquet_read( filename[fi] );
while (!feof(f)) {
char buffer[256];
parquet_read( buffer, 256, f );
// do stuff with buffer
// fields could be \0 or otherwise separated.
}
fclose(f);
}
could someone please recommend good solutions here, presumably either a cli fast output streamer that I can pipe into C or a non-deep native C library I could use?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
