Category "apache-beam"

Apache Beam Python SDK: How to access timestamp of an element?

I'm reading messages via ReadFromPubSub with timestamp_attribute=None, which should set timestamps to the publishing time. This way, I end up with a PCollecti

kerberos error while authenticating on Confluent Kafka

I´ve been trying to understand apache beam, confluent kafka and dataflow integration with python 3.8 and beam sdk 2.7 the desire result is to build a pipe

MapCoder issue after updating Beam to 2.35

After updating Beam from 2.33 to 2.35, started getting this error: def estimate_size(self, unused_value, nested=False): estimate = 4 # 4 bytes for int32 size p

Unable to create a template

I am trying to create a dataflow template using the below mvn command And i have a json config file in the bucket where i need to read different config file for

Why Spark Submit causes NoSuchMethodError when I run a uber jar made though maven shade plugin?

I have a Apache Beam project which works fine if I directly run it. But if i try to create a jar using maven clean:package it creates a uber jar using maven sha

Apache Beam FileIO match - What's better/more efficient way to match files? [closed]

I'm just wondering - does the use of wildcard have an impact on how Beam matches files? For instance, if I want to match a file with Apache Be

Correct way to define an apache beam pipepline

I am new to Beam and struggling to find many good guides and resources to learn best practices. One thing I have noticed is there are two ways pipelines are de