'How to write record from parquet to another parquet?

I have a big parquet file with some data. Let's say there is many information about some animals like:

 id, name, breed, traits

and I can query it in the spark in a standard way by sql. Example:

spark.sql("SELECT * form animals where id IN (10, 11)").collect()

and I got a result.

But what I want to do is copy that found records as new parquet, with the same structure. Is that even possible? I tried to find some information on web, but I don't find anything useful so stack as always is my last hope :)

Maybe someone has some hints or resources, docs about that kind of operation on parquets?



Solution 1:[1]

You can store that results in a df and then save that data as parquet file -

df = spark.sql("SELECT * form animals where id IN (10, 11)") 

df.write.parquet("filename.parquet")

To know more about parquet file read and write - click here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 philantrovert