The Connector SDK and Cloud Search REST API allow the creation of Cloud Search Indexing Queues used to perform the following tasks:
Maintain the per-document state (status, hash values, and so on) which can be used to keep your index in sync with your repository.
Maintain a list of items to be indexed as discovered during the traversal process.
Prioritize items in queues based on item status.
Maintain additional state information for efficient integration such as checkpoints, change token, and so on.
Status & priority
A document’s priority in a queue is based on its ItemStatus
code. Following
are the possible
ItemStatus
codes in order of priority (handled first to handled last):
ERROR
- Item encountered asynchronous error during the indexing process and needs to be re-indexed.MODIFIED
- Item that was previously indexed and has since been modified in the repository since the last indexing.NEW_ITEM
— Item that is not indexed.ACCEPTED
— Document that was previously indexed and has not changed in the repository since the last indexing.
When two items in a queue have the same status, higher priority is given to the items that have been in the queue for the longest period of time.
Queue operations (Connector SDK)
The Content Connector SDK provides operations for pushing items to, and pulling items from, a queue.
To package and push an item to a queue, use the pushItems
builder class.
You do not need to do anything specific to pull items from a queue for
processing. Instead, the SDK automatically pulls items from the queue, in priority
order, using the
Repository class's
getDoc
method.
Queue operations (REST API)
The REST API provides the following two methods for pushing items to and pulling items from a queue:
- To push an item to a queue, use
Items.push
. - To poll items in the queue, use
Items.poll
.
You can also use
Items.index
to push items to the queue during indexing. Items pushed to the queue during
indexing don’t require a
type
and are automatically assigned a status of
ACCEPTED
.
Items.push
The
Items.push
method adds IDs to the queue. This method can be called with a specific
type
value which determines the result of push operation. For a list of type
values, refer
to the
item.type
field in the Items.push
method.
Pushing a new ID results in adding a new entry with an NEW_ITEM
ItemStatus
code.
The optional payload is always stored, treated as an opaque value, and returned
from
Items.poll
.
When an item is polled, it is reserved meaning it cannot be returned by
another call to
Items.poll
.
Using
Items.push
with
type
as NOT_MODIFIED
, REPOSITORY_ERROR
, or REQUEUE
, unreserves
polled entries. For further information about reserved and unreserved entries,
refer to the Items.poll section..
Items.push
with hashes
The Cloud Search REST API supports specifying metadata and content hash values
on
Items.index
requests. Instead of specifying
type
,
the metadata and/or content hash values
can be specified with a push request. The Cloud Search Indexing Queue compares
the provided hash values with the stored values available with the item in the
data source. If mismatched, that entry is marked as MODIFIED
. If a corresponding
item doesn't exist in the index, then the status is NEW_ITEM
.
Items.poll
The Items.poll method retrieves the highest priority entries from the queue. The requested and returned status values indicate the status(es) of the priority queue(s) requested or the status of the returned IDs.
By default, entries from any section of the queue may be returned, based on
priority. Each returned entry is reserved, and is not returned by other
calls to
Items.poll
until one of the following cases is met:
- The reservation times out.
- The entry is enqueued again by
Items.index
. Items.push
is called with atype
value ofNOT_MODIFIED
,REPOSITORY_ERROR
, orREQUEUE
.