'Aggregate data based on 3 columns and update table auto incremental update in kudu using spark sql

I receive CSV files every X minute I want to aggregate these data based on 4 columns using spark Then store this data into kudu, after the first time of storing data in kudu I want to merge the newly received data with already stored in the kudu I'm thinking to use merge when match but this will try to match every record and update it or insert it I'm worried about the performance because I received billions of records daily

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Aggregate data based on 3 columns and update table auto incremental update in kudu using spark sql

Sources

Related Questions