This guide is intended for Google Cloud Search SharePoint On-Prem connector administrators, that is, anyone who is responsible for downloading, configuring, running, and monitoring the connector.
This guide includes instructions for performing key tasks related to SharePoint On-Prem connector deployment:
- Download the Google Cloud Search SharePoint On-Prem connector software
- Configure the connector for use with an on-premises SharePoint Server data source
- Deploy and run the connector
To understand the concepts in this document, you should be familiar with the fundamentals of G Suite and on-premises SharePoint Server.
By default, Google Cloud Search can discover, index, and serve content from G Suite data such as Google Docs and Gmail. You can extend Google Cloud Search to serve SharePoint On-Prem content to your users by using the Sharepoint On-Prem connector.
Configuration properties files
To enable the connector to discover content from the on premise SharePoint and upload it to the indexing API, you must provide specific information to the connector by creating a configuration file. During the configuration steps described in Deployment steps, you build the configuration file by adding parameters.
In addition to the SharePoint On-Prem connector parameters described in this document, there are configuration parameters used by all Cloud Search connectors. For detailed information, see Google-supplied connector parameters.
Supported operating systems
- Windows Server 2016
- Red Hat Enterprise Linux 5.0
- SUSE Enterprise Linux 10 (64 bit)
Supported SharePoint versions
- SharePoint Server 2016
- SharePoint Server 2013
Indexing unpublished docs
The Cloud Search SharePoint On-Prem connector always honors the Search Visibility setting on SharePoint (you cannot override this). For draft documents, indexing depends on the permissions that are given to the connector user account. If the connector user has only "Full Read" permissions, the connector will honor all "Draft item visibility" settings on SharePoint.
Supported authentication mechanisms
- HTTP Basic
Known connector limitations
- The number of content databases will affect document change detection latency.
- The number of unique users and groups used in ACLs for each site collection will affect memory consumption.
- Current version of connector supports identities from only one Active Directory Domain.
- The current version of the connector doesn’t support well known Active
Directory and Windows principals, such as
Everyone, BUILTIN\Users, All Authenticated Users.
- The current version of the connector doesn’t generate instant delete notifications.
- The current version of the connector relies on re-indexing of content to identify deletes from source repository. For previously indexed content, delete detection latency can be more than 4 hours.
Before you deploy the Cloud Search SharePoint On-Prem connector
Before you deploy the Google Cloud Search SharePoint On-Prem connector, ensure that you have the following required components:
- A supported operating system and Sharepoint Server
- Java JRE 1.8 installed on a computer that runs the Google Cloud Search SharePoint On-Prem connector
G Suite information required to establish relationships between Google Cloud Search and the data source:
- G Suite private key (which contains the service account ID). For information on obtaining a private key, refer to Configure access to the Google Cloud Search REST API.
- G Suite data source ID. For information on obtaining a data source ID, refer to Add a data source to search.
Typically, the G Suite administrator for the domain can supply these credentials for you.
A user account for the connector, with Full Read permissions to SharePoint Web Application in the User Policy
Ensure that Web Application has a root site collection created. This connector doesn’t support indexing content from web application where no root site collection is present.
If there are any write-locked site collections, run the PrepareWriteLockedSitesForAdaptor.ps1 script on SharePoint using an account that has Admin privileges before installing the connector.
To assist you with configuring this connector, log into the SharePoint server using farm administration privileges and run
The output of this command, including number of web applications, number of documents, and number of user group membership count, helps in estimating the number of connector instances required, memory requirements, and expected document count.
To deploy the Google Cloud Search SharePoint On-Prem connector, follow these steps:
- Install the Google Cloud Search SharePoint On-Prem connector software.
- Specify the SharePoint On-Prem connector configuration.
- Configure access to the Google Cloud Search data source.
- Configure access to SharePoint On-Prem.
- Add SharePoint as a trusted host.
- Configure site collections.
- Enable logging.
- Configure SharePoint Identity Mapping with Google Cloud Search.
- Configure Active Directory Lookup.
- Configure HTML Generation and Structured Data support
1. Install the Google Cloud Search SharePoint On-Prem connector software.
Clone the connector repository from GitHub.
$ git clone https://github.com/google-cloudsearch/sharepoint-connector.git $ cd sharepoint-connector
Check out the desired version of the connector:
$ git checkout tags/v1-0.0.3
Build the connector.
$ mvn package
To skip tests when you build the connector, run
mvn package -DskipTestsinstead of
Copy the connector zip file to your local installation directory:
$ cp target/google-cloudsearch-sharepoint-connector-v1-0.0.3.zip installation-dir $ cd installation-dir $ unzip google-cloudsearch-sharepoint-connector-v1-0.0.3.zip $ cd google-cloudsearch-sharepoint-connector-v1-0.0.3
2. Specify the SharePoint On-Prem connector configuration
For the connector to properly access SharePoint On-Prem and index the relevant content, you must first create its configuration file. You control the SharePoint On-Prem connector’s behavior and attributes by defining parameters in the connector’s configuration file. Configurable parameters control:
- Access to a data source
- Access to the SharePoint On-Prem server
- Index the SharePoint On-Prem server as a site collection
To create a configuration file:
- Open a text editor of your choice and name the configuration file.
- Add key=value pairs to the file contents as described in the following sections.
- Save and name the configuration file. Google recommends that you name the
connector-config.propertiesso no additional command-line parameters are required to run connector.
3. Configure access to the Google Cloud Search data source
The first parameters every configuration file must specify are the ones necessary to access the Cloud Search data source, as shown in the following table. Typically, you will need the Data source ID, Identity source ID and the path to the service account’s private key file in order to configure the connector’s access to Cloud Search. The steps required to set up a data source are described in Add a data source to search.
|Data source ID||
Required. The Google Cloud Search source ID set up by the G Suite administrator, as described in Manage third-party data sources.
|Path to the service account private key file||
Required. The Google Cloud Search service account key file for Google Cloud Search SharePoint On-Prem connector accessibility.
|Identity source ID||
Required. The Cloud Search identity source ID set up by the G Suite administrator.
4. Configure access to SharePoint On-Prem
Before the connector can access SharePoint On-Prem and extract data from it for indexing, you must configure access to the SharePoint server. Use the following parameter to add access information to the configuration file.
|Fully-qualified domain name for the SharePoint server||
Required. If the domain name is not fully-qualified, then you must set DNS override on the connector host.
Required when running connector on Linux or on a windows machine that is not part of SharePoint Server AD domain.
Required when running connector on Linux or on a windows machine that is not part of SharePoint Server AD domain.
|Use Live Authentication to connect to SharePoint||
|Use ADFS Authentication to connect to SharePoint||
5. Add SharePoint as a trusted host
If SharePoint is configured to use HTTPS, get a SharePoint certificate to add it as a trusted host for the connector by performing the following steps:
- Navigate to SharePoint in a browser. A warning page appears with a message such as "This Connection is Untrusted". This message appears because the certificate is self-signed and not signed by a trusted Certificate Authority. Click I Understand the Risks and Add Exception.
- Wait until the View button is clickable, then click it.
- Change to the "Details" tab and click "Export".
- Save the certificate in your connector's directory with the name
- Click Close and Cancel to close the windows.
To allow the connector to trust SharePoint, open a command prompt and enter the following command:
$ keytool -importcert -keystore cacerts.jks -storepass changeit -file sharepoint.crt -alias sharepoint
When prompted "Trust this certificate?", answer yes.
6. Configure site collections
Optionally, you can configure the connector to index a SharePoint server as a site collection.
|Whether sharepoint.server is a site collection, instead of a virtual server||
Default is auto-detected. If true then the connector indexes
7. Enable logging
Create a folder named
logsin the same directory that contains the connector
binary. Create an ASCII or UTF-8 file named
logging.properties in the same
directory and add the following content:
handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler # Default log level .level = INFO # uncomment line below to increase logging level for SharePoint APIsa #com.google.enterprise.cloudsearch.sharepoint.level=FINE # uncomment line below to increase logging level to enable API trace #com.google.api.client.http.level = FINE java.util.logging.ConsoleHandler.level = INFO java.util.logging.FileHandler.pattern=logs/connector-sharepoint.%g.log java.util.logging.FileHandler.limit=10485760 java.util.logging.FileHandler.count=10 java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
8. Configure SharePoint Identity Mapping with Google Cloud Search
Google Cloud Search allows its customers to apply ACL trimmings on search results. These ACLs can be defined using Google principals as well as external principals.
A typical SharePoint On-Prem setup involves the following 3 principals.
- Active Directory Users
- Active Directory Groups
- SharePoint Local Groups (with Active Directory users and groups as members)
To apply appropriate security trimmings for SharePoint Content, you also need to sync these external identities with Google.
- Use the Google Cloud Directory Sync tool (with added support for identity mapped groups) to sync Active Directory users and groups.
- Use SharePoint Identity Connector for syncing SharePoint local groups.
To support such setup you need to create 2 identity sources.
- An identity source for syncing Active Directory users and groups.
- An Identity source for SharePoint Local groups
|Identity Source ID||
Identity source ID for syncing SharePoint Local Groups. Required. The Google Cloud Search source ID set up by the G Suite administrator, as described in Add a data source to search.
|Reference Identity Sources||
List of reference Identity sources for active directory principals. Value should match Active Directory NETBIOS name for active directory principals being referred.
|Reference Identity Source IDs||
Required. Identity Source ID for syncing Active Directory principals for domain CONTOSO.
9. Configure Active Directory Lookup
While other connectors rely on Google Cloud Directory Sync for syncing Active Directory users and groups, the SharePoint connector needs to perform lookup with AD to fetch additional information about the principals being synced. Use cases include
- Mapping SID for a domain group to corresponding sAMAccountName.
- Mapping user sAMAccountName to email address for SharePoint local group memberships.
|Active Directory Host||
Required. Active directory hostname or IP address.
|Active Directory lookup port||
Optional. Default is 389. Use 686 for ssl.
|Active Directory lookup method||
Optional. Default is standard. Use “ssl” for secure connection.
|Active Directory lookup user||
Required. User authorized to perform active directory lookups.
|Active Directory lookup password||
Required. Password for user specified by
10. Configure Structured Data Support and HTML Content Generation for SharePoint ListItems
To index additional metadata for SharePoint List Items, configure the connector to support HTML content generation and/or structured data.
HTML content generation
Use the parameters in the following table to configure HTML content generation. For information about optional HTML content generation parameters, see HTML Content Generation.
|HTML template title field||
SharePoint field to be used as “Title” for generated HTML.
|HTML content high search quality fields||
Fields to include in generated HTML as high quality fields. Match of search query terms in these fields will be ranked higher.
|HTML content medium search quality fields||
Fields to include in generated HTML as medium quality fields.
|HTML content low search quality fields||
Fields to include in generated HTML as low quality fields.
|HTML content unmapped columns||
Default is APPEND. Set to APPEND if you want to include unmapped fields (not being part of high, medium, low configurations) in generated HTML content. Set to IGNORE to generate HTML using only mapped columns.
Structured data support
The connector populates structured data for SharePoint list items if the schema for the datasource is defined using the following guidelines:
The connector maps SharePoint content type names to the corresponding object definitions by normalizing the SharePoint content type name as per the specifications defined by CloudSearch API. Cloud Search API only supports A-Z,a-z and 0-9 as valid characters for object definitions. Connector normalizes content type names by excluding unsupported characters. For example, the content type "Announcements" maps to the object definition "Announcements" whereas the content type "News Article" maps to "NewsArticle".
The connector also supports configuring a fallback object type via the
itemMetadata.objectTypeconfiguration. The connector uses the fallback object type if no object definition corresponding to SharePoint content type is available in the schema.
The connector maps SharePoint property names to property definitions by normalizing display names for SharePoint columns.
Example: Configuration file
The following example configuration file shows the parameter key=value pairs that define an example connector’s behavior.
api.sourceId=08ef8becd116faa4546b8ca2c84b2879 api.serviceAccountPrivateKeyFile=service_account.json api.identitySourceId=08ef8becd116faa475de26d9b291fed9 # Optional contentTemplate.sharepointItem.title=Title contentTemplate.sharepointItem.unmappedColumnsMode=APPEND sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection sharepoint.siteCollectionOnly=true sharepoint.username=contoso\\admin sharepoint.password=pa$sw0rd sharepoint.stripDomainInUserPrincipals=true adLookup.host=dc.contoso.com adLookup.port=389 adLookup.username=contoso\\admin adLookup.password=pa$sw0rd api.referenceIdentitySources=CONTOSO,contoso api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa
Run the SharePoint On-Prem identity connector
For users to be able to obtain results in Cloud Search for SharePoint content they have access to, it's necessary to first map the principals in both the on-prem Active Directory and the SharePoint site collection to identities in the Google Cloud Identity service. This synchronization is done via the Google Cloud Directory Sync (GCDS) application and the SharePoint On-Prem identity connector. For the steps to use the GCDS, refer to About Google Directory Sync. For additional information on enabling identity mapped groups, refer to Sync groups to a Cloud Search identity source
After GCDS has synced the users and groups in the Active Directory, run the SharePoint On-Prem identity connector, as explained below, to sync the SharePoint site collection groups.
The identity connector uses a configuration file similar to the one used to index content. Following is a sample identity connector configuration file:
api.customerId=C05d3djk8 api.serviceAccountPrivateKeyFile=service_account.json api.identitySourceId=08ef8becd116faa475de26d9b291fed9 sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection sharepoint.siteCollectionOnly=true sharepoint.username=contoso\\admin sharepoint.password=pa$sw0rd sharepoint.stripDomainInUserPrincipals=true adLookup.host=dc.contoso.com adLookup.port=389 adLookup.username=contoso\\admin adLookup.password=pa$sw0rd api.referenceIdentitySources=CONTOSO,contoso api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa
Notice that this file contains the
api.customerId property. This property
contains your G Suite account ID. For information on
customerID, refer to
Map user identities in cloud search.
The same JAR file used to index content also contains the identity connector. To run it, issue the following command in the directory containing the configuration file. In place of version, use the current version number, which you can find on the GitHub Releases page.
java -Djava.util.logging.config.file=logging.properties -cp "google-cloudsearch-sharepoint-connector-version.jar" com.google.enterprise.cloudsearch.sharepoint.SharePointIdentityConnector
Run the SharePoint On-Prem connector
Run the connector by using
cmd.exe on the host machine:
$ java -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-0.0.3.jar
To run the connector against HTTPs-secured SharePoint sites, add SharePoint as the trusted host as described above.
$ java -Djavax.net.ssl.trustStore=cacerts.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=changeit -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-0.0.3.jar
The information in this section extends beyond basic SharePoint connector configuration.
Multi-Tenant SharePoint deployments typically host multiple customer sites under the same Web application. Customers gets permissions only for their respective site collections. In such a scenario it is not possible to get Full Read permissions on SharePoint web application as required by the SharePoint On-Prem connector.
Such multi-tenant configurations are supported via Site Collection Only mode. To
support a multi-tenant configuration, site collection mode must be enabled by
sharepoint.siteCollectionOnly configuration option in the
To index site collection at root level in site collection only mode, you need to
sharepoint.siteCollectionOnly to true explicitly.
The connector will index a site collection and its child items. For this reason the connector user account on SharePoint needs site collection administrator permissions.
If you have multiple site collections to index in a multi-tenant environment, you need to configure one connector instance for each of the site collections.
To configure the SharePoint On-Prem connector in site collection only mode:
sharepoint.serveras site collection URL, such as
If the site collection URL is the root site collection (such as
http://sharepoint.example.com), explicitly set
Non-canonical URLs in site collection only mode
The SharePoint On-Prem connector allows non-canonical URLs in site collection
only mode. That is, the connector URL as specified by the
configuration option in the
connector-config.properties file need not be in
exactly the same case as on SharePoint.
Override Content-Type for Microsoft Outlook .msg files
If the connector encounters Outlook
.msg files when crawling content, it
overrides the Content-Type for the files and indexes them as