'appending to existing avro file oython

I'm exploring the avro file format and am currently struggling to append data. I seem to overwrite in each run. I found an existing thread here, saying I should not pass in a schema in order to "append" to existing file without overwriting. Even my lint gives this clue: If the schema is not present, presume we're appending.. However, If I try to declare DataFileWriter as DataFileWriter(open("users.avro", "wb"), DatumWriter(), None) then the code wont run.

Simply put, how do I append values to an existing avro files without writing over existing content.

schema = avro.schema.parse(open("user.avsc", "rb").read()
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)

print("start appending")
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 12, "favorite_color": "blue"})
writer.close()
print("write successful!")

# Read data from an avro file
with open('users.avro', 'rb') as f:
    reader = DataFileReader(open("users.avro", "rb"), DatumReader())
    users = [user for user in reader]
    reader.close()

print(f'Schema {schema}')
print(f'Users:\n {users}')


Solution 1:[1]

I'm not sure how to do it with the standard avro library, but if you use fastavro it can be done. See the example below:

from fastavro import parse_schema, writer, reader

schema = {
 "namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

parsed_schema = parse_schema(schema)

records = [
    {"name": "Alyssa", "favorite_number": 256},
    {"name": "Ben", "favorite_number": 12, "favorite_color": "blue"},
]

# Write initial 2 records
with open("users.avro", "wb") as fp:
    writer(fp, schema, records)

# Append third record
with open("users.avro", "a+b") as fp:
    writer(fp, schema, [{"name": "Chris", "favorite_number": 1}])

# Read all records
with open("users.avro", "rb") as fp:
    for record in reader(fp):
        print(record)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Scott