Tune connector settings

The Google Cloud Search SDK includes Google-supplied configuration parameters for all connectors. Tuning these settings can streamline data indexing. This guide lists common indexing issues and the settings to resolve them.

Low indexing throughput for FullTraversalConnector

The following table lists settings to improve throughput for a FullTraversalConnector:

Setting Description Default Suggested Change
traverse.partitionSize The number of ApiOperation() items processed in batches. The SDK waits for a partition to complete before fetching more. 50 Increase to 1000 or more if you have sufficient memory.
batch.batchSize The number of requests batched together. 10 Try lowering the batch size.
batch.maxActiveBatches Allowable concurrent batches. 20 If you lower batchSize, increase this using: (partitionSize / batchSize) + 50.
traverse.threadPoolSize Number of threads for parallel processing. 50 Increase this by multiples of 10.

Consider using setRequestMode() to switch between ASYNCHRONOUS and SYNCHRONOUS API request modes.

Low indexing throughput for ListTraversalConnector

A ListTraversalConnector uses one traverser by default. To increase throughput, create multiple traversers for specific item statuses (e.g., NEW_ITEM, MODIFIED).

SettingDescriptionDefaultChange
repository.traversersCreates individual traversers with unique names (e.g., t1, t2).One traverserAdd more traversers.
traversers.t1.hostloadNumber of threads to simultaneously index items.5Try values of 10 or greater.
schedule.pollQueueIntervalSecsSeconds to wait before re-polling an empty queue.10Try lowering to 1.
traverser.t1.pollRequest.statusesStatuses to index (e.g., NEW_ITEM).AllUse different traversers for different statuses.

SDK timeouts or interrupts

If you experience timeouts when uploading large files, increase the timeout using traverser.timeout=seconds (default is 60 seconds). You can also increase API request timeouts:

Parameter Description Default
indexingService.connectTimeoutSeconds Connect timeout for API requests. 120s
indexingService.readTimeoutSeconds Read timeout for API requests. 120s