Deploy the Microsoft SharePoint On-Prem Connector

This guide is intended for Google Cloud Search SharePoint On-Prem connector administrators, that is, anyone who is responsible for downloading, configuring, running, and monitoring the connector.

This guide includes instructions for performing key tasks related to SharePoint On-Prem connector deployment:

  • Download the Google Cloud Search SharePoint On-Prem connector software
  • Configure the connector for use with a specific SharePoint On-Prem data source
  • Deploy and run the connector

To understand the concepts in this document, you should be familiar with the fundamentals of G Suite, SharePoint on premise.

Overview

By default, Google Cloud Search can discover, index, and serve content from G Suite data such as Google Docs and Gmail. You can extend the reach of Google Cloud Search to include serving SharePoint On-Prem content to your users by using the Sharepoint On-Prem connector.

Configuration properties files

To enable the connector to discover content from the on premise SharePoint and upload it to the indexing API, you must provide specific information to the connector by creating a configuration file. During the configuration steps described in Deployment steps, you build the configuration file by adding parameters.

In addition to the SharePoint On-Prem connector parameters described in this document, there are configuration parameters used by all Cloud Search connectors. For detailed information, see Google-supplied connector parameters.

Supported operating systems

The Cloud Search SharePoint On-Prem connector supports the following operating systems:

  • Windows Server 2016
  • Ubuntu
  • Red Hat Enterprise Linux 5.0
  • SUSE Enterprise Linux 10 (64 bit)

Supported SharePoint versions

The Cloud Search SharePoint On-Prem connector supports SharePoint Server 2016 and SharePoint Server 2013.

Indexing unpublished docs

The Cloud Search SharePoint On-Prem connector always honors the Search Visibility setting on SharePoint (you cannot override this). For draft documents, indexing depends on the permissions that are given to the connector user account. If the connector user has only "Full Read" permissions, the connector will honor all "Draft item visibility" settings on SharePoint.

Supported authentication mechanisms

The Cloud Search SharePoint On-Prem connector supports NTLM, Kerberos, HTTP Basic, and ADFS authentication.

Known connector limitations

Following is a list of known limitations for this connector:

  • The number of content databases will affect document change detection latency.
  • The number of unique users and groups used in ACLs for each site collection will affect memory consumption.
  • Current version of connector supports identities from only one Active Directory Domain.
  • The current version of the connector doesn’t support well known Active Directory and Windows principals, such as Everyone, BUILTIN\Users, All Authenticated Users.
  • The current version of the connector doesn’t generate instant delete notifications.
  • The current version of the connector relies on re-indexing of content to identify deletes from source repository. For previously indexed content, delete detection latency can be more than 4 hours.

Before you deploy the Cloud Search SharePoint On-Prem connector

The Google Cloud Search SharePoint On-Prem connector can be installed on Linux or Windows. Before you deploy the Google Cloud Search SharePoint On-Prem connector, ensure that you have the following required components:

  • Windows Server 2016
  • SharePoint Server 2016 or SharePoint Server 2013
  • Java JRE 1.8 installed on a computer that runs the Google Cloud Search SharePoint On-Prem connector
  • G Suite information required to establish relationships between Google Cloud Search and the data source:

    Typically, the G Suite administrator for the domain can supply these credentials for you.

  • A user account for the connector, with Full Read permissions to SharePoint Web Application in the User Policy

  • Ensure that Web Application has a root site collection created. This connector doesn’t support indexing content from web application where no root site collection is present.

  • If there are any write-locked site collections, run the PrepareWriteLockedSitesForAdaptor.ps1 script on SharePoint using an account that has Admin privileges before installing the connector.

  • To gather additional information about your SharePoint environment, which can be handy in configuring SharePoint connector, run diagnose_sp.ps1 on the SharePoint server using an account that has farm administration privileges.

    While it is not a mandatory step to run the script before deploying SharePoint connector, output of the script, which includes information such as number of web applications, authentication mechanism, number of documents and user group membership count, is very helpful in estimating the number of connector instances required, memory requirements, as well as expected document count.

Deployment steps

To deploy the Google Cloud Search SharePoint On-Prem connector, follow these steps:

  1. Install the Google Cloud Search SharePoint On-Prem connector software.
  2. Specify the SharePoint On-Prem connector configuration.
  3. Configure access to the Google Cloud Search data source.
  4. Configure access to SharePoint On-Prem.
  5. Add SharePoint as a trusted host.
  6. Configure site collections.
  7. Enable logging.
  8. Configure SharePoint Identity Mapping with Google Cloud Search.
  9. Configure Active Directory Lookup.
  10. Configure HTML Generation and Structured Data support

1. Install the Google Cloud Search SharePoint On-Prem connector software.

Google provides the installation software for the connector in the following file:

google-cloudsearch-sharepoint-connector-v1-0.0.2.zip

Download and extract the Microsoft SharePoint On-Prem connector and save it to a local working directory where the connector runs. This directory can also contain all the relevant files required for execution, including the configuration file, service account key file

2. Specify the SharePoint On-Prem connector configuration

For the connector to properly access SharePoint On-Prem and index the relevant content, you must first create its configuration file. You control the SharePoint On-Prem connector’s behavior and attributes by defining parameters in the connector’s configuration file. Configurable parameters control:

  • Access to a data source
  • Access to the SharePoint On-Prem server
  • Index the SharePoint On-Prem server as a site collection

To create a configuration file:

  • Open a text editor of your choice and name the configuration file.
  • Add key=value pairs to the file contents as described in the following sections.
  • Save and name the configuration file. Google recommends that you name the configuration file connector-config.properties so no additional command-line parameters are required to run connector.

3. Configure access to the Google Cloud Search data source

The first parameters every configuration file must specify are the ones necessary to access the Cloud Search data source, as shown in the following table. Typically, you will need the Data source ID, Identity source ID and the path to the service account’s private key file in order to configure the connector’s access to Cloud Search. The steps required to set up a data source are described in Add a data source to search.

Setting Parameter
Data source ID api.sourceId=1234567890abcdef

Required. The Google Cloud Search source ID set up by the G Suite administrator, as described in Manage third-party data sources.

Path to the service account private key file api.serviceAccountPrivateKeyFile=./PrivateKey.json

Required. The Google Cloud Search service account key file for Google Cloud Search SharePoint On-Prem connector accessibility.

Identity source ID api.identitySourceId=x0987654321

Required. The Cloud Search identity source ID set up by the G Suite administrator.

4. Configure access to SharePoint On-Prem

Before the connector can access SharePoint On-Prem and extract data from it for indexing, you must configure access to the SharePoint server. Use the following parameter to add access information to the configuration file.

Setting Parameter
Fully-qualified domain name for the SharePoint server sharepoint.server=http://yoursharepoint.example.com/

Required. If the domain name is not fully-qualified, then you must set DNS override on the connector host.

SharePoint username sharepoint.username=YOURDOMAIN\\ConnectorUser

Required when running connector on Linux or on a windows machine that is not part of SharePoint Server AD domain.

SharePointPassword sharepoint.password=user_password

Required when running connector on Linux or on a windows machine that is not part of SharePoint Server AD domain.

Use Live Authentication to connect to SharePoint sharepoint.username=AdaptorUser Live Authentication Id

sharepoint.password uS3R_passWoRD

sharepoint.formsAuthenticationMode=LIVE

Use ADFS Authentication to connect to SharePoint sharepoint.username=AdaptorUser@yourdomain.com

sharepoint.password=uS3R_passWoRD

sharepoint.sts.endpoint=https://adfs.example.com/adfs/services/trust/2005/usernamemixed

sharepoint.sts.realm=urn:myserver:sharepoint or https://yoursharepoint.example.com/_trust

sharepoint.formsAuthenticationMode=ADFS

5. Add SharePoint as a trusted host

If SharePoint is configured to use HTTPS, get a SharePoint certificate to add it as a trusted host for the connector by performing the following steps:

  • Navigate to SharePoint in a browser. A warning page appears with a message such as "This Connection is Untrusted." This message appears because the certificate is self-signed and not signed by a trusted Certificate Authority. Click I Understand the Risks and Add Exception.
  • Wait until the View... button is clickable, then click it.
  • Change to the "Details" tab and click "Export...".
  • Save the certificate in your connector's directory with the name sharepoint.crt.
  • Click Close and Cancel to close the windows.
  • To allow the connector to trust SharePoint, open a command prompt and enter the following command: keytool -importcert -keystore cacerts.jks -storepass changeit -file sharepoint.crt -alias sharepoint

  • When prompted "Trust this certificate?", answer yes.

6. Configure site collections

Optionally, you can configure the connector to index a SharePoint server as a site collection.

Setting Parameter
Whether sharepoint.server is a site collection, instead of a virtual server sharepoint.siteCollectionOnly=true

Default is auto-detected. If true then the connector indexes sharepoint.server as a site collection.

7. Enable logging

Create a folder named logsin the same directory that contains the connector binary. Create an ASCII or UTF-8 file named logging.properties in the same directory and add the following content:

handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler
# Default log level
.level = INFO
# uncomment line below to increase logging level for SharePoint APIsa
#com.google.enterprise.cloudsearch.sharepoint.level=FINE

# uncomment line below to increase logging level to enable API trace
#com.google.api.client.http.level = FINE
java.util.logging.ConsoleHandler.level = INFO
java.util.logging.FileHandler.pattern=logs/connector-sharepoint.%g.log
java.util.logging.FileHandler.limit=10485760
java.util.logging.FileHandler.count=10
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter

Google Cloud Search allows its customers to apply ACL trimmings on search results. These ACLs can be defined using Google principals as well as external principals.

A typical SharePoint On-Prem setup involves the following 3 principals.

  • Active Directory Users
  • Active Directory Groups
  • SharePoint Local Groups (with Active Directory users and groups as members)

To apply appropriate security trimmings for SharePoint Content, you also need to sync these external identities with Google.

  • Use the Google Cloud Directory Sync tool (with added support for identity mapped groups) to sync Active Directory users and groups.
  • Use SharePoint Identity Connector for syncing SharePoint local groups.

To support such setup you need to create 2 identity sources.

  • An identity source for syncing Active Directory users and groups.
  • An Identity source for SharePoint Local groups
Setting Parameter
Identity Source ID api.identitySourceId=1234567890abcdef

Identity source ID for syncing SharePoint Local Groups. Required. The Google Cloud Search source ID set up by the G Suite administrator, as described in Add a data source to search.

Reference Identity Sources api.referenceIdentitySources=CONTOSO,contoso

List of reference Identity sources for active directory principals. Value should match Active Directory NETBIOS name for active directory principals being referred.

Reference Identity Source IDs api.referenceIdentitySource.CONTOSO.id=112233abcd

Required. Identity Source ID for syncing Active Directory principals for domain CONTOSO.

9. Configure Active Directory Lookup

While Connector rely on Google Cloud Directory Sync for syncing Active Directory users and groups, SharePoint connector needs to perform lookup with AD to fetch additional information about principals being synced. Use cases include

  • Mapping SID for a domain group to corresponding sAMAccountName.
  • Mapping user sAMAccountName to email address for SharePoint local group memberships.
Setting Parameter
Active Directory Host adLookup.host=dc.contoso.com

Required. Active directory hostname or IP address.

Active Directory lookup port adLookup.port=389

Optional. Default is 389. Use 686 for ssl.

Active Directory lookup method adLookup.method=standard

Optional. Default is standard. Use “ssl” for secure connection.

Active Directory lookup user adLookup.username=CONTOSO\user1

Required. User authorized to perform active directory lookups.

Active Directory lookup password adLookup.passowrd=password123

Required. Password for user specified by adLookup.user.

10. Configure Structured Data Support and HTML Content Generation for SharePoint ListItems

To index additional metadata for SharePoint List Items, configure the connector to support HTML content generation and/or structured data.

HTML content generation

Use the parameters in the following table to configure HTML content generation. For information about optional HTML content generation parameters, see HTML Content Generation.

Setting Parameter
HTML template title field contentTemplate.sharePointItem.title=Title

SharePoint field to be used as “Title” for generated HTML.

HTML content high search quality fields contentTemplate.sharePointItem.quality.high=highField1,highField2…

Fields to include in generated HTML as high quality fields. Match of search query terms in these fields will be ranked higher.

HTML content medium search quality fields contentTemplate.sharePointItem.quality.medium=mediumField1, mediumField2…

Fields to include in generated HTML as medium quality fields.

HTML content low search quality fields contentTemplate.sharePointItem.quality.low=lowField1, lowField2…

Fields to include in generated HTML as low quality fields.

HTML content unmapped columns contentTemplate.sharepointItem.unmappedColumnsMode=IGNORE

Default is ignore. Connector generates HTML only using mapped columns. Set to APPEND if you want to include unmapped fields (not being part of high, medium, low configurations) in generated HTML content.

Structured data support

The connector populates structured data for SharePoint list items if the schema for the datasource is defined using the following guidelines:

  • The connector maps SharePoint Content Type names to corresponding object definitions by normalizing SharePoint Content Type name as per specifications defined by CloudSearch API. Cloud Search API only supports A-Z,a-z and 0-9 as valid characters for object definitions. Connector normalizes Content Type names by excluding unsupported characters.

For example, Content Type Announcements maps to Object Definition “Announcements” where as Content Type “News Article” maps to “NewsArticle.”

  • The connector maps SharePoint property names to property definitions by normalizing display names for SharePoint columns.

Example: Configuration file

The following example configuration file shows the parameter key=value pairs that define an example connector’s behavior.

api.sourceId=08ef8becd116faa4546b8ca2c84b2879
api.serviceAccountPrivateKeyFile=service_account.json
api.identitySourceId=08ef8becd116faa475de26d9b291fed9

# Optional
contentTemplate.sharepointItem.title=Title
contentTemplate.sharepointItem.unmappedColumnsMode=APPEND

sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection
sharepoint.siteCollectionOnly=true
sharepoint.username=contoso\\admin
sharepoint.password=pa$sw0rd
sharepoint.stripDomainInUserPrincipals=true

adLookup.host=dc.contoso.com
adLookup.port=389
adLookup.username=contoso\\admin
adLookup.password=pa$sw0rd

api.referenceIdentitySources=CONTOSO,contoso
api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa
api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa

Run the SharePoint On-Prem identity connector

For users to be able to obtain results in Cloud Search for SharePoint content they have access to, it's necessary to first map the principals in both the on-prem Active Directory and the SharePoint site collection to identities in the Google Cloud Identity service. This synchronization is done via the Google Cloud Directory Sync (GCDS) application and the SharePoint On-Prem identity connector. For the steps to use the GCDS, refer to About Google Directory Sync. For additional information on enabling identity mapped groups, refer to Sync groups to a Cloud Search identity source

After GCDS has synced the users and groups in the Active Directory, run the SharePoint On-Prem identity connector, as explained below, to sync the SharePoint site collection groups.

The identity connector uses a configuration file similar to the one used to index content. Following is a sample identity connector configuration file:

api.customerId=C05d3djk8
api.serviceAccountPrivateKeyFile=service_account.json
api.identitySourceId=08ef8becd116faa475de26d9b291fed9

sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection
sharepoint.siteCollectionOnly=true
sharepoint.username=contoso\\admin
sharepoint.password=pa$sw0rd
sharepoint.stripDomainInUserPrincipals=true

adLookup.host=dc.contoso.com
adLookup.port=389
adLookup.username=contoso\\admin
adLookup.password=pa$sw0rd

api.referenceIdentitySources=CONTOSO,contoso
api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa
api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa

Notice that this file contains the api.customerId property. This property contains your G Suite account ID. For information on generating a customerID, refer to Map user identities in cloud search.

The same JAR file used to index content contains also the identity connector. To run it, issue the following command in the directory containing the configuration file:

java -Djava.util.logging.config.file=logging.properties -cp "google-cloudsearch-sharepoint-connector-v<version>-withlib.jar" com.google.enterprise.cloudsearch.sharepoint.SharePointIdentityConnector

Run the SharePoint On-Prem connector

Run the connector by using cmd.exe on the host machine:

java -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v<version>-withlib.jar

To run the connector against HTTPs-secured SharePoint sites, add SharePoint as the trusted host as described above.

java -Djavax.net.ssl.trustStore=cacerts.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=changeit -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v<version>-withlib.jar

Advanced Topics

The information in this section extends beyond basic SharePoint connector configuration.

Multi-Tenant configurations

Multi-Tenant SharePoint deployments typically host multiple customer sites under the same Web application. Customers gets permissions only for their respective site collections. In such a scenario it is not possible to get Full Read permissions on SharePoint web application as required by the SharePoint On-Prem connector.

Such multi-tenant configurations are supported via Site Collection Only mode. To support a multi-tenant configuration, site collection mode must be enabled by using the sharepoint.siteCollectionOnly configuration option in the connector-config.properties file.

To index site collection at root level in site collection only mode, you need to set sharepoint.siteCollectionOnly to true explicitly.

The connector will index a site collection and its child items. For this reason the connector user account on SharePoint needs site collection administrator permissions.

If you have multiple site collections to index in a multi-tenant environment, you need to configure one connector instance for each of the site collections.

To configure the SharePoint On-Prem connector in site collection only mode:

  • Specify sharepoint.server as site collection URL, such as http://sharepoint.example.com/sites/sitecollection.

  • If the site collection URL is the root site collection (e.g. http://sharepoint.example.com), explicitly set sharepoint.siteCollectionOnly=true.

Non-canonical URLs in site collection only mode

The SharePoint On-Prem connector allows non-canonical URLs in site collection only mode. That is, the connector URL as specified by the sharepoint.server configuration option in the connector-config.properties file need not be in exactly the same case as on SharePoint.

Override Content-Type for Microsoft Outlook .msg files

If the connector encounters Outlook .msg files when crawling content, it overrides the Content-Type for the files and indexes them as application/vnd.ms-outlook.

Send feedback about...

Cloud Search
Cloud Search