'Difference between using a list or a pcollection

Im building a pipeline in apache beam and I just got curious about this, whats the difference between applying a ptransform to a list and a pcollection, is the performance affected by this or is just that the pcollection is inmutable and is this a bad way to aproach a pipeline with apache beam?



Solution 1:[1]

By definition, a PCollection is a unbounded collection. Immutable, and unbounded.

The main difference with a list is mainly the unbounded characteristic and it's especially powerful when you are streaming data (from a large file, or from a unbounded source, like PubSub).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 guillaume blaquiere