'how to push huge data in less time from Database table to Elastic search using log stash
I'm processing the 500 000 records from Postgres database to elastic using Logstash but it taking 40 minutes to completed the process. I want to reduce the process time and i have changed the pipeline.batch.size: 1000, pipeline.batch.delay: 50 in logstash.yml file and increase the heap space 1 gb to 2 gb in the JVM.options file still processing the records in same time.
Conf file
input {
jdbc {
jdbc_driver_library => "C:\Users\Downloads\elk stack/postgresql-42.3.1.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres123"
statement => "SELECT * FROM jolap.order_desk_activation"
}
}
output {
elasticsearch {
hosts =>["http://localhost:9200/"]
index => "test-powerbi-transformed"
document_type => "_doc"
}
stdout {}
}
Solution 1:[1]
The problem is not the logstash pipeline or the batch size. As above suggested, u need to get volume is less time.
This can be achieved using "Parallel Hints" which makes the query superfast, as the query start using the core processor the DB infrastructure (Dont miss to consult your DBA before applying this). Once u start getting volume records in less time, you can scale your logstash or tweak the pipeline settings. Refer to this link.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | kcsquared |
