'How to convert a Beam dataframe to a pandas dataframe?

Reading from Beam dataframe overview, it looks like I can just convert a beam PCollection to a DataFrame using

from apache_beam.dataframe.convert import to_dataframe 
df = to_dataframe(pcollections)

However, the df is still a Beam DataFrame not a pandas DataFrame. Is it possible to convert it to a pure pandas DataFrame?

(The dataframe in my problem is small enough to fit into the memory of single machine. Also, Beam DataFrame miss some critical feature so I still need the pure pandas functionality.)



Solution 1:[1]

If you're willing to use Interactive Beam in a notebook you can do this by calling ib.collect(df). This will execute the Beam job and retrieve the results as a pandas DataFrame.

I'd also love to know what critical features are missing in Beam DataFrames that you'd like to use, so we can prioritize them. We discuss some of the differences between Beam DataFrame and pandas here, but note that even the operations marked "wont implement" might get implemented at some point in the future, we'll just need to do some more substantial design work for them. Thanks for the question!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1