'Google CloudSearch CSV Connector hits a top limit when indexing
We are using Google's CSV connector to attempt to index a CSV file with 600k+ records. In the Test datasource, the number of records that get indexed top out at 8k. A different upper bound is seen for the Prod data, but at 130k. The connector keeps running but no additional records are indexed. Is there a datasource limit or some other limiting factor? Below are some of our tuning params from the config file
connector.runOnce=false
traverse.threadPoolSize=1000
traverse.partitionSize=4000
batch.batchSize=20
batch.maxQueueLength=8000
batch.maxActiveBatches=250
batch.maxBatchDelaySeconds=20
batch.readTimeoutSeconds=120
batch.connectTimeoutSeconds=300
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
