'Size of encoded avro message without encoding it
Is there a way to get the size of the encoded avro message without actually encoding it?
I'm using Avro 1.8.1 for C++.
I'm used to google protocol buffers where you can call ByteSize() on a protobuf to get the encoded size, so it's something similar i'm looking for.
Since the message in essence is a raw struct I get that the size cannot be retrieved from the message itself, but perhaps there is a helper method that i'm not aware of?
Solution 1:[1]
(Edit below shows a hacky way to shrink-to-fit an OutputStream after writing to it with a BinaryEncoder)
It's a shame that avro::encode() doesn't use backup on the OutputStream to free unused memory after encoding. Martin G's answer gives the best solution using only the tools avro provides, but it issues N memory allocations of 1 byte each if your serialized object is N bytes in size.
You could implement a custom avro::OutputStream that simply counts and discards all written bytes. This would get rid of the memory allocations. It's still not a great approach, as the actual encoder will have to "ask" for every single byte:
(Code untested, just for demonstration purposes)
#include <avro/Encoder.hh>
#include <cstdint>
class ByteCountOutputStream : public avro::OutputStream {
public:
size_t byteCount_ = 0;
uint8_t dummyWriteLocation_;
explicit ByteCountOutputStream() {};
bool next(uint8_t **data, size_t *len) final {
byteCount_ += 1;
*data = &dummyWriteLocation_;
*len = 1;
return true;
}
void backup(size_t len) final {
byteCount_ -= len;
}
uint64_t byteCount() const final {
return byteCount_;
}
void flush() final {}
};
this could then be used as:
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
ByteCountOutputStream out();
encoder->init(out);
avro::encode(*encoder, obj);
size_t bufferSize = out.byteCount();
Edit:
My initial question when stumbling upon this was: How can I tell how many bytes of the OutputStream are required (for storing / transmitting)? Or, equivalently, if OutputStream.byteCount() returns the count of bytes allocated by the encoder so far, how can I make the encoder "backup" / release the bytes it didn't use? Well, there is a hacky way:
The Encoder abstract class provides a init method. For the BinaryEncoder, this is currently implemented as:
void BinaryEncoder::init(OutputStream &os) {
out_.reset(os);
}
with out_ being the internal StreamWriter of the Encoder.
Now, the StreamWriter implements reset as:
void reset(OutputStream &os) {
if (out_ != nullptr && end_ != next_) {
out_->backup(end_ - next_);
}
out_ = &os;
next_ = end_;
}
which will return unused memory back to the "old" OutputStream before switching to the new one.
So, you can abuse the encoder's init method like this:
// setup as always
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream();
// actual serialization
encoder->init(*out);
avro::encode(*encoder, obj);
// re-init on the same OutputStream. Happens to shrink the stream to fit
encoder->init(*out);
size_t bufferSize = out->byteCount();
However, this behavior is not documented, so it might break in the future.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
