'Kafka Avro To BigQuery using Apache Beam in Java
Here is the scenario:
Kafka To BigQuery using Apache Beam. This is an alternative to BigQuerySinkConnector [WePay] using Kafka Connect.
I have been able to read Avro message from Kafka Topic. I am also able to print the contents to console accurately. I am looking for help with writing these KafkaRecords to BigQuery table.
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);
//Customer is an auto generated class from avro schema using eclipse avro maven plugin
// Read from Kafka Topic and get KafkaRecords
@SuppressWarnings("unchecked")
PTransform<PBegin, PCollection<KafkaRecord<String, Customer>>> input = KafkaIO.<String, Customer>read()
.withBootstrapServers("http://server1:9092")
.withTopic("test-avro")
.withConsumerConfigUpdates(ImmutableMap.of("specific.avro.reader", (Object)"true"))
.withConsumerConfigUpdates(ImmutableMap.of("auto.offset.reset", (Object)"earliest"))
.withConsumerConfigUpdates(ImmutableMap.of("schema.registry.url", (Object)"http://server2:8181"))
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder((Class)KafkaAvroDeserializer.class, AvroCoder.of(Customer.class));
// Print kafka records to console log
pipeline.apply(input)
.apply("ExtractRecord", ParDo.of(new DoFn<KafkaRecord<String,Customer>, KafkaRecord<String,Customer>>() {
@ProcessElement
public void processElement(ProcessContext c) {
KafkaRecord<String, Customer> record = (KafkaRecord<String, Customer>) c.element();
KV<String, Customer> log = record.getKV();
System.out.println("Key Obtained: " + log.getKey());
System.out.println("Value Obtained: " + log.getValue().toString());
c.output(record);
}
}));
// Write each record to BigQuery Table
// Table is already available in BigQuery so create disposition would be CREATE_NEVER
// Records to be appended to table - so write disposition would be WRITE_APPEND
// All fields in the Customer object have corresponding column names and datatypes - so it is one to one mapping
// Connection to BigQuery is through service account JSON file. This file has been set as environment variable in run config of eclipse project
// Set table specification for BigQuery
String bqTable = "my-project:my-dataset:my-table";
The current examples available - shows how to manually set a schema and assign field by field the values. I am looking for an automated way to infer the Customer Avro object and assign it to the columns directly without such manual field by field assignment.
Is this possible?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
