'Getting a LocalDate out of an avro GenericRecord in Java
I have written a class, AvroIterator<T>, that is a small wrapper around some avro calls, allowing the user to specify a filename, schema filename, and a method to get a T from an avro GenericRecord.
This works fine with primitive types. For example, in my tests I have this class:
public class TripleData {
private final int x;
private final double y;
private final String z;
public TripleData(int x, double y, String z) {
this.x = x;
this.y = y;
this.z = z;
}
}
static TripleData getTripleData(GenericRecord record) {
return new TripleData(
(int)record.get("x"),
(double)record.get("y"),
((CharSequence)record.get("z")).toString()
);
}
and I get an iterator as follows:
Iterator<TripleData> iterator = new AvroIterator<>(TRIPLE_DATA_FILENAME_1,
TRIPLE_DATA_SCHEMA_FILENAME,
AvroIteratorTest::getTripleData);
The problem arises when I want to deserialize a class containing a logical type. Currently I'm trying to do date.
private static class DatedData {
private final LocalDate date;
private final double x;
private final int y;
private DatedData(LocalDate date, double x, int y) {
this.date = date;
this.x = x;
this.y = y;
}
}
My schema is
{
"type": "record",
"name": "DatedData",
"fields": [
{"name": "date", "type": "int"},
{"name": "x", "type": "double"},
{"name": "y", "type": "int"}
]
}
When I try to write an equivalent getDatedData(GenericRecord record) method, I cannot follow the same pattern as for TripleData. If I call (LocalDate)record.get("date"), a ClassCastException gets thrown, since that record.get call returns an int.
I don't want to just use a call to LocalDate.ofEpochDay, since then I'm depending on the avro documentation on how to convert from the primitive type to the logical type, and there is no checking that the schema genuinely defines this field to be a date.
It looks like the 'correct' thing to do create an instance of avro's TimeConversions.DateConversion class, and call its fromInt method to make my conversion. This has the signature
public LocalDate fromInt(Integer daysFromEpoch, Schema schema, LogicalType type)
I therefore added an inner class Converter to DatedData, which would hold the Schema, and have a conversion method:
public static class Converter {
private final Schema schema;
private final LogicalType dateLogicalType;
private final TimeConversions.DateConversion innerConverter;
Converter(Schema schema) {
this.schema = schema;
this.dateLogicalType = schema.getField("date").schema().getLogicalType();
innerConverter = new TimeConversions.DateConversion();
}
Converter(String schemaFilename) throws IOException {
this(new Schema.Parser().parse(new File(schemaFilename)));
}
public DatedData getFromGenericRecord(GenericRecord record) {
LocalDate date = innerConverter.fromInt((int)record.get("date"), schema, dateLogicalType);
return new DatedData(
date,
(double)record.get("x"),
(int)record.get("y")
);
}
}
I then create my iterator as follows:
Iterator<DatedData> iterator = new AvroIterator<>(DATED_DATA_FILENAME,
DATED_DATA_SCHEMA_FILENAME,
new DatedData.Converter(DATED_DATA_SCHEMA_FILENAME)::getFromGenericRecord);
This works, but it still works if I replace the Schema and LogicalType in the fromInt call with null. It also still works if I modify my schema to make "date" just have type int. i.e. I get no verification of the schema with this method.
What I hoped to gain from this is some sort of failure if I was trying to convert something that was not really serialized as a date, and some protection in case avro ever change the definition of how a date is serialized. Also, I would like for anyone reading the code to be able to say "ok, clearly this uses Avro's method to deserialize a date, I don't have to go and read the avro docs."
Is there something better that I can do?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
