The Google Cloud Search Indexing Queue

The Connector SDK and Cloud Search REST API allow the creation of Cloud Search Indexing Queues used to perform the following tasks:

  • Maintain the per-document state (status, hash values, and so on) which can be used to keep your index in sync with your repository.

  • Maintain a list of items to be indexed as discovered during the traversal process.

  • Prioritize items in queues based on item status.

  • Maintain additional state information for efficient integration such as checkpoints, change token, and so on.

Status & priority

A document’s priority in a queue is based on its ItemStatus code. Following are the possible ItemStatus codes in order of priority (handled first to handled last):

  • ERROR - Item encountered asynchronous error during the indexing process and needs to be re-indexed.

  • MODIFIED - Item that was previously indexed and has since been modified in the repository since the last indexing.

  • NEW_ITEM — Item that is not indexed.

  • ACCEPTED — Document that was previously indexed and has not changed in the repository since the last indexing.

When two items in a queue have the same status, higher priority is given to the items that have been in the queue for the longest period of time.

Queue operations (Connector SDK)

The Content Connector SDK provides operations for pushing items to, and pulling items from, a queue.

To package and push an item to a queue, use the pushItems builder class.

You do not need to do anything specific to pull items from a queue for processing. Instead, the SDK automatically pulls items from the queue, in priority order, using the Repository class's getDoc method.

Queue operations (REST API)

The REST API provides the following two methods for pushing items to and pulling items from a queue:

You can also use Items.index to push items to the queue during indexing. Items pushed to the queue during indexing don’t require a type and are automatically assigned a status of ACCEPTED.

Items.push

The Items.push method adds IDs to the queue. This method can be called with a specific type value which determines the result of push operation. For a list of type values, refer to the item.type field in the Items.push method.

Pushing a new ID results in adding a new entry with an NEW_ITEM ItemStatus code

The optional payload is always stored, treated as an opaque value, and returned from Items.poll.

When an item is polled, it is reserved meaning it cannot be returned by another call to Items.poll. Using Items.push with type as NOT_MODIFIED, REPOSITORY_ERROR, or REQUEUE, unreserves polled entries. For further information about reserved and unreserved entries, refer to the Items.poll section..

Items.push with hashes

The Cloud Search REST API supports specifying metadata and content hash values on Items.index requests. Instead of specifying type, the metadata and/or content hash values can be specified with a push request. The Cloud Search Indexing Queue compares the provided hash values with the stored values available with the item in the data source. If mismatched, that entry is marked as MODIFIED. If a corresponding item doesn't exist in the index, then the status is NEW_ITEM.

Items.poll

The Items.poll method retrieves the highest priority entries from the queue. The requested and returned status values indicate the status(es) of the priority queue(s) requested or the status of the returned IDs.

By default, entries from any section of the queue may be returned, based on priority. Each returned entry is reserved, and is not returned by other calls to Items.poll until one of the following cases is met:

  • The reservation times out.
  • The entry is enqueued again by Items.index.
  • Items.push is called with a type value of NOT_MODIFIED, REPOSITORY_ERROR, or REQUEUE.

Send feedback about...

Cloud Search
Cloud Search