'Hive to elasticsearch bulk insertion and deletion performance optimization

We have been trying to ingest data from hive to elasticsearch. The bulk insertions and deletes will happen every week. And the data is quite high in both cases. The scenario being 1 week's data will be deleted and data older than 12 weeks should be deleted. 60 million data is being inserted and 60 million data is being deleted every week. Currently the insertion process inserts 5 million data in an hour. Can this be optimized further?

Since same index is being processed every time the replication factor cannot be 0 while insertion. I am currently using ES-hadoop jar (elasticsearch-hadoop-8.0.1.jar)



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source