The Google Cloud Search SDK includes Google-supplied configuration parameters for all connectors. Tuning these settings can streamline data indexing. This guide lists common indexing issues and the settings to resolve them.
Low indexing throughput for FullTraversalConnector
The following table lists settings to improve throughput for a
FullTraversalConnector:
| Setting | Description | Default | Suggested Change |
|---|---|---|---|
traverse.partitionSize |
The number of ApiOperation() items processed in batches. The SDK waits for a partition to complete before fetching more. |
50 | Increase to 1000 or more if you have sufficient memory. |
batch.batchSize |
The number of requests batched together. | 10 | Try lowering the batch size. |
batch.maxActiveBatches |
Allowable concurrent batches. | 20 | If you lower batchSize, increase this using: (partitionSize / batchSize) + 50. |
traverse.threadPoolSize |
Number of threads for parallel processing. | 50 | Increase this by multiples of 10. |
Consider using setRequestMode() to switch between ASYNCHRONOUS and
SYNCHRONOUS API request modes.
Low indexing throughput for ListTraversalConnector
A ListTraversalConnector uses one traverser by default. To increase
throughput, create multiple traversers for specific item statuses (e.g.,
NEW_ITEM, MODIFIED).
| Setting | Description | Default | Change |
|---|---|---|---|
repository.traversers | Creates individual
traversers with unique names (e.g., t1, t2). | One traverser | Add more traversers. |
traversers.t1.hostload | Number of threads to simultaneously index items. | 5 | Try values of 10 or greater. |
schedule.pollQueueIntervalSecs | Seconds to wait before re-polling an empty queue. | 10 | Try lowering to 1. |
traverser.t1.pollRequest.statuses | Statuses
to index (e.g., NEW_ITEM). | All | Use different traversers for different statuses. |
SDK timeouts or interrupts
If you experience timeouts when uploading large files, increase the timeout
using traverser.timeout=seconds
(default is 60 seconds). You can also increase API request timeouts:
| Parameter | Description | Default |
|---|---|---|
indexingService.connectTimeoutSeconds |
Connect timeout for API requests. | 120s |
indexingService.readTimeoutSeconds |
Read timeout for API requests. | 120s |