Create a content connector

A content connector is a software program that traverses data in an enterprise repository and populates a data source. Google provides the following options for developing content connectors:

The Content Connector SDK. This is a good option for Java programmers. The SDK is a wrapper around the REST API that lets you quickly create connectors. To create a content connector using the SDK, see Create a content connector using the Content Connector SDK.
A low-level REST API or API libraries. Use these options if you don't use Java or if your codebase better accommodates a REST API or a library. To create a content connector using the REST API, see Create a content connector using the REST API.

A typical content connector performs the following tasks:

Reads and processes configuration parameters.
Pulls discrete chunks of indexable data, called "items," from the third-party repository.
Combines ACLs, metadata, and content data into indexable items.
Indexes items to the Cloud Search data source.
(Optional) Listens for change notifications from the repository. Change notifications convert into indexing requests to keep the Cloud Search data source in sync. The connector only performs this task if the repository supports change detection.

Create a content connector using the Content Connector SDK

The following sections explain how to create a content connector using the Content Connector SDK.

Set up dependencies

Include these dependencies in your build file.

Maven

xml <dependency> <groupId>com.google.enterprise.cloudsearch</groupId> <artifactId>google-cloudsearch-indexing-connector-sdk</artifactId> <version>v1-0.0.3</version> </dependency>

Gradle

groovy compile group: 'com.google.enterprise.cloudsearch', name: 'google-cloudsearch-indexing-connector-sdk', version: 'v1-0.0.3'

Create your connector configuration

Every connector uses a configuration file for parameters like your repository ID. Define parameters as key-value pairs, such as api.sourceId=1234567890abcdef.

The Google Cloud Search SDK includes Google-supplied parameters for all connectors. You must declare the following in your configuration file:

Content connector: Declare api.sourceId and api.serviceAccountPrivateKeyFile. These identify your repository and the private key needed for access.

Identity connector: Declare api.identitySourceId to identify your external identity source. For user syncing, also declare api.customerId (the unique ID for your Google Workspace account).

Declare other Google-supplied parameters only to override their default values. For details on generating IDs and keys, see Google-supplied parameters.

You can also define repository-specific parameters in your configuration file.

Pass the configuration file to the connector

Set the config system property to pass the configuration file. Use the -D argument when starting the connector. For example:

java -classpath myconnector.jar -Dconfig=MyConfig.properties MyConnector

If you omit this argument, the SDK attempts to use a file named connector-config.properties in the local directory.

Determine your traversal strategy

The primary function of a content connector is to traverse a repository and index its data. You must implement a strategy based on your repository's size and layout. You can design your own or choose a strategy from the SDK:

Full traversal strategy: Scans the entire repository and indexes every item. This strategy is best for small repositories where you can afford the overhead of a full traversal during each indexing. Use it for small repositories with mostly static, non-hierarchical data, or when change detection is difficult.
List traversal strategy: Scans the entire repository to determine the status of each item, then indexes only new or updated items. Use this for incremental updates to a large, non-hierarchical index when change detection isn't supported.
Graph traversal: Scans a parent node to determine the status of its items, then indexes new or updated items in that node. It then recursively processes child nodes. Use this for hierarchical repositories where listing all IDs isn't practical, such as directory structures or websites.

The SDK implements these strategies in template connector classes. These templates can speed up your development. To use a template, see the corresponding section:

Create a full traversal connector using a template class
Create a list traversal connector using a template class
Create a graph traversal connector using a template class

Create a full traversal connector using a template class

This section refers to code from the FullTraversalSample.

Implement the connector entry point

The entry point is the main() method. It creates an Application instance and calls start() to run the connector.

Before calling application.start(), use the IndexingApplication.Builder class to instantiate the FullTraversalConnector template. This template accepts a Repository object.

FullTraversalSample.java

Create a content connector Stay organized with collections Save and categorize content based on your preferences.

Create a content connector using the Content Connector SDK

Set up dependencies

Maven

Gradle

Create your connector configuration

Pass the configuration file to the connector

Determine your traversal strategy

Create a full traversal connector using a template class

Implement the connector entry point

Implement the Repository interface

Get custom configuration parameters

Perform a full traversal

Set the permissions for an item

Set the metadata for an item

Create the indexable item

Package each indexable item in an iterator

Next steps

Create a list traversal connector using a template class

Implement the connector entry point

Implement the Repository interface

Perform the list traversal

Push item IDs and hash values

Retrieve and handle each item

Handle deleted items

Handle unchanged items

Set the permissions for an item

Set the metadata for an item

Create an indexable item

Next steps

Create a graph traversal connector using a template class

Implement the connector's entry point

Implement the Repository interface

Perform the graph traversal

Push item IDs and hash values

Retrieve and handle each item

Handle deleted items

Set metadata and create the item

Place child IDs in the Indexing Queue

Create a content connector using the REST API

Determine your traversal strategy

Implement your traversal strategy and index items

Handle repository changes

Create a content connector