Deploy the Microsoft SharePoint On-Prem Connector

You can set up Google Cloud Search to return results from your organization's SharePoint on-premises content in addition to your Google Workspace content. You use the Google Cloud Search SharePoint On-Prem connector and configure it to access a specific SharePoint data source.

Important considerations

Honored SharePoint settings

The Cloud Search SharePoint On-Prem connector always honors the Search Visibility setting on SharePoint, which can't be overridden. For draft documents, the permissions on the user account (that the connector uses to access SharePoint Online) control what draft documents are indexed and returned. If the account has only "Full Read" permissions, the connector honors the "Draft item visibility" settings on SharePoint.

You can also configure the connector to limit results based on user account access. You can use Google principals and external principals to define ACLs. To apply security trimming for SharePoint content, synchronize the following external identities with the Google Directory:

Active Directory Users
Active Directory Groups
SharePoint Local Groups (with Active Directory users and groups as members)

To synchronize AD users and groups, you use Google Cloud Directory Sync, enabling identity mapped groups. To synchronize SharePoint local groups, you use the SharePoint Identity Connector.

The connector also needs to perform lookup with AD to fetch additional information to synchronize the principals. For example, lookup with AD lets the connector do the following:

Map the SID for a domain group to the corresponding sAMAccountName.
Map a user sAMAccountName to the email address for SharePoint local group memberships.

Search optimization

You can improve your users' experience by configuring the connector to return more relevant search results.

To use the API, set values for HTML generation parameters in the SharePoint Online connector configuration file. These parameters let you set which fields have higher or lower impact on matches.

To set up a schema, follow the instructions in Create and register a schema. When you set up a schema:

To map the names of SharePoint content types to corresponding object definitions, the connector normalizes the content type names by excluding unsupported characters. For object definitions, the Cloud Search API supports only A-Z, a-z, and 0-9 as valid characters. For example, the content type "Announcements" maps to the object definition "Announcements". The content type "News Article" maps to "NewsArticle" (no space).
When the connector can't match an object definition with an object definition, the connector uses the fallback object type (itemMetadata.objectType). Learn more about metadata configuration parameters.
To map SharePoint property names to property definitions, the connector normalizes the property names by decoding hex-encoded characters and removing "ows_" prefixes, then excluding unsupported characters (all characters except A-Z, a-z, and 0-9 as valid characters).

Microsoft Outlook message handling

When the connector encounters Microsoft Outlook .msg files as it indexes content, it overrides the media type for the files and indexes them as application/vnd.ms-outlook.

Multi-tenant configurations

If your SharePoint is a multi-tenant deployment, where multiple customer sites are hosted on the same Web application, you need to configure site collection mode in the configuration file. In multi-tenant deployments, you get permissions only for your site collection and can't get Full Read permissions, as required by the SharePoint On-Prem connector.

To enable site collection mode:

Give the connector user account site collection administrator permissions.
Set sharepoint.server in your connector configuration file to the site collection URL, such as http://sharepoint.example.com/sites/sitecollection. The URL doesn't need to use the exact same case as on SharePoint.
Set sharepoint.siteCollectionOnly in your connector configuration file to true.

If you have multiple site collections to index in a multi-tenant environment, you need to configure one connector instance for each of the site collections.

Known connector limitations

The time it takes the connector to detect changes to items in the databases increases with the number of databases the connector monitors.
Memory consumption increases with the number of unique users and groups that you use in ACLs for each site collection.
You can configure the connector with identities from only one Active Directory Domain.
Some common Active Directory and Windows principals, such as Everyone, BUILTIN\Users, and All Authenticated Users, aren't supported.
Delete notifications are not instantaneous and it can take more than 4 hours for a connector to recognize that a user deleted content from the source repository.

System requirements

System requirements
Operating system	Windows Server 2016 Ubuntu Red Hat Enterprise Linux 5.0 SUSE Enterprise Linux 10 (64 bit)
Software	SharePoint server SharePoint Server 2016 SharePoint Server 2013 Java JRE 1.8 installed on the computer that will run the Google Cloud Search SharePoint On-Prem connector
Authentication	NTLM Kerberos HTTP Basic ADFS

Deploy the connector

Prerequisites

Create a Google Workspace private key, which contains your service account ID. To learn how to get a private key, go to Configure access to the Google Cloud Search API.
Your Google Workspace administrator must add a data source to search. Record the data source ID.
If the connector returns results based on ACLs (results aren't public), your Google Workspace administrator must create two identity sources and give you their IDs:
- An identity source for syncing Active Directory users and groups.
- An Identity source for SharePoint Local groups
The admin must also get your organization's Google Workspace customer ID and give it to you.

Learn how to get these values in Map user identities in Cloud Search.

Note: You need to set up one identity source per Active Directory domain. If Active Directory is shared by SharePoint instances, you can refer to that identity source for all SharePoint connector instances.
Set up a user account for the connector that has Full Read permissions to SharePoint Web Application in the user policy.
If the SharePoint Web Application doesn't have a root site collection, create one.
If any site collections are write-locked, sign in to the SharePoint server with an account that has Admin privileges and run the PrepareWriteLockedSites.ps1 script.
To get data source metrics to inform your connector configuration, sign in to the SharePoint server with an account that has farm administration privileges and run diagnose_sp.ps1.

The output reports the numbers of web applications, documents, and user group memberships. Use this information to estimate how many connector instances you need, memory requirements, and document count.

Step 1. Install the Google Cloud Search SharePoint On-Prem connector software.

Clone the connector repository from GitHub.

$ git clone https://github.com/google-cloudsearch/sharepoint-connector.git
$ cd sharepoint-connector

Check out the desired version of the connector:
```
$ git checkout tags/latest_version
```
Where: latest_version = a value such as v1-0.0.5
Build the connector.
```
$ mvn package
```
To skip tests when you build the connector, run mvn package -DskipTests instead of mvn package.

Copy the connector zip file to your local installation directory:

$ cp target/google-cloudsearch-sharepoint-connector-latest_version.zip installation-dir
$ cd installation-dir
$ unzip google-cloudsearch-sharepoint-connector-latest_version.zip
$ cd google-cloudsearch-sharepoint-connector-latest_version

Step 2. Create the SharePoint On-Prem connector configuration file

In the same directory as the connector installation, create a file. Google recommends that you name the file connector-config.properties so no additional command-line parameters are required to run the connector. If you plan to run many connector instances, add details to the name to distinguish it.

Add parameters as key/value pairs to the file contents, as in the following example:

### Sharepoint On-Prem Connector configuration ###

# Required parameters for data source access
api.sourceId=08ef8becd116faa4546b8ca2c84b2879
api.serviceAccountPrivateKeyFile=service_account.json
api.identitySourceId=08ef8becd116faa475de26d9b291fed9

# Required parameters for SharePoint on-premises access
sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection
sharepoint.siteCollectionOnly=true
sharepoint.username=contoso\\admin
sharepoint.password=pa$sw0rd
sharepoint.stripDomainInUserPrincipals=true

# Required parameters for AD lookup
adLookup.host=dc.contoso.com
adLookup.username=contoso\\admin
adLookup.password=pa$sw0rd
api.referenceIdentitySources=CONTOSO,contoso
api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa
api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa

# Optional parameters for schema mapping
contentTemplate.sharepointItem.title=Title
contentTemplate.sharepointItem.unmappedColumnsMode=APPEND

For detailed descriptions of each parameter, go to the configuration parameters reference.

(Optional) Configure additional connector parameters, as needed. For details, go to Google-supplied connector parameters.

Step 3. For HTTPS, add SharePoint as a trusted host

If SharePoint is configured to use HTTPS, get a SharePoint certificate to add it as a trusted host for the connector.

On the computer that will run the connector, open a browser and go to SharePoint.
In the warning page that opens, click I Understand the Risks and Add Exception. The page shows a message such as "This Connection is Untrusted" because the certificate is self-signed and not signed by a trusted Certificate Authority.
Once the View button is available, click it.
Go to the Details tab and click Export.
Save the certificate in the connector directory with the name sharepoint.crt.
Click Close then Cancel to close the windows.
Open a command prompt and enter the following command:
```
$ keytool -importcert -keystore cacerts.jks -storepass changeit -file sharepoint.crt -alias sharepoint
```
When prompted "Trust this certificate?", answer yes.

Step 4. Set up logging

In the directory that contains the connector binary, create a folder named logs.
In the same directory (not logs), create a Latin1-encoded file named logging.properties.

Add the following text to logging.properties:

handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler
# Default log level
.level = INFO
# uncomment line below to increase logging level for SharePoint APIsa
#com.google.enterprise.cloudsearch.sharepoint.level=FINE

# uncomment line below to increase logging level to enable API trace
#com.google.api.client.http.level = FINE
java.util.logging.ConsoleHandler.level = INFO
java.util.logging.FileHandler.pattern=logs/connector-sharepoint.%g.log
java.util.logging.FileHandler.limit=10485760
java.util.logging.FileHandler.count=10
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter

Step 5. Configure the SharePoint On-Prem identity connector

This step is required to apply SharePoint On-Prem identity-based ACLs to search results. If you set up the connector with public ACLs, you can skip this step.

In the same directory as the SharePoint Online connector installation, create a file and name it sharepoint-onprem-identity-connector.config.

Add parameters as key/value pairs to the file contents, as in the following example:

### SharePoint On-prem identity connector configuration ###

# Required parameters for data source access
api.customerId=C05d3djk8
api.serviceAccountPrivateKeyFile=service_account.json
api.identitySourceId=08ef8becd116faa475de26d9b291fed9

# Required parameters for SharePoint access
sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection
sharepoint.siteCollectionOnly=true
sharepoint.username=contoso\\admin
sharepoint.password=pa$sw0rd
sharepoint.stripDomainInUserPrincipals=true

# Required parameters for AD lookup
adLookup.host=dc.contoso.com
adLookup.username=contoso\\admin
adLookup.password=pa$sw0rd
api.referenceIdentitySources=CONTOSO,contoso
api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa
api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa

The values are almost the same as for the SharePoint On-Prem connector, except that instead of api.sourceId, the parameter is api.customerId. The value of api.customerId is the customer ID that you got from your Google Workspace admin.

Step 6. Launch the SharePoint On-Prem connector

In the following steps, you map the principals in both the on-premises Active Directory and the SharePoint site collection to identities in the Cloud Identity service. This synchronization is done with Google Cloud Directory Sync (GCDS) and the SharePoint On-Prem identity connector.

After GCDS synchronizes users and groups, to synchronize the SharePoint site collection groups, run the SharePoint On-Prem identity connector. Lastly, run the SharePoing On-Prem connector to index and serve results to your Cloud Search users.

If you haven't already, configure and run GCDS. Make sure to enable identity mapped groups.

Run the SharePoint On-Prem identity connector:

$ java -Djava.util.logging.config.file=logging.properties -cp "google-cloudsearch-sharepoint-connector-version.jar" com.google.enterprise.cloudsearch.sharepoint.SharePointIdentityConnector -Dconfig=sharepoint-onprem-identity-connector.config

Run the SharePoint On-Prem connector. Use the command syntax for your SharePoint site security:

HTTP (no trusted host required):

$ java -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-version.jar

HTTPS (add SharePoint as the trusted host):

$ java -Djavax.net.ssl.trustStore=cacerts.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=changeit -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-version.jar

Configuration parameters reference

Data source access

Setting	Parameter
Data source ID	`api.sourceId=1234567890abcdef` Required. The Google Cloud Search data source ID set up by the Google Workspace administrator.
Path to the service account private key file	`api.serviceAccountPrivateKeyFile=PrivateKey.json` Required. The path to the Google Cloud Search service account key file.

SharePoint on-premises access

Setting	Parameter
SharePoint server URL	`sharepoint.server=http://yoursharepoint.example.com/` Required. The URL of the SharePoint server as a fully-qualified host name, such as http://yoursharepoint.example.com/. If the host name is not fully-qualified, then you must set DNS override on the connector host.
SharePoint username	`sharepoint.username=YOURDOMAIN\\ConnectorUser` Required when you run the connector on Linux or on a windows machine that is not part of the SharePoint Server AD domain.
SharePoint password	`sharepoint.password=user_password` Required when you run the connector on Linux or on a windows machine that is not part of the SharePoint Server AD domain.
Use Live Authentication to connect to SharePoint	`sharepoint.username=AdaptorUser Live Authentication Id` `sharepoint.password uS3R_passWoRD` `sharepoint.formsAuthenticationMode=LIVE`
Use ADFS Authentication to connect to SharePoint	`sharepoint.username=AdaptorUser@yourdomain.com` `sharepoint.password=uS3R_passWoRD` `sharepoint.sts.endpoint=https://adfs.example.com/adfs/services/trust/2005/usernamemixed` `sharepoint.sts.realm=urn:myserver:sharepoint` or `https://yoursharepoint.example.com/_trust` `sharepoint.formsAuthenticationMode=ADFS`

Site collection indexing

Setting	Parameter
Index type	`sharepoint.siteCollectionOnly=boolean` Optional, except for multi-tenant SharePoint deployments (learn more). Set to true to have the connector index `sharepoint.server` as a site collection instead of as a virtual server. Default is null (auto-detected).

SharePoint Identity Mapping

Setting	Parameter
Identity Source ID	`api.identitySourceId=1234567890abcdef` Required. Identity source ID for syncing SharePoint Local Groups.The Google Cloud Search source ID set up by the Google Workspace administrator, as described in Add a data source to search.
Reference Identity Sources	`api.referenceIdentitySources=CONTOSO,contoso` A comma-delimited list of reference identity sources for active directory principals. The value matches Active Directory NETBIOS name of the reference active directory principals.
Reference Identity Source IDs	`api.referenceIdentitySource.DOMAIN.id=identity-source-id` Required. The Identity Source ID for syncing Active Directory principals.

Active Directory Lookup

Setting	Parameter
Active Directory Host	`adLookup.host=host` Required. Active directory hostname, such as dc.contoso.com, or IP address.
Active Directory lookup port	`adLookup.port=port` Optional. Default is 389. Use 686 for SSL.
Active Directory lookup method	`adLookup.method=value` Optional. Default is `standard`. For HTTPS connections, set to `ssl`.
Active Directory lookup user	`adLookup.username=CONTOSO\user1` Required. User authorized to perform active directory lookups.
Active Directory lookup password	`adLookup.password=password123` Required. Password for user specified by `adLookup.user`.

HTML content generation

Setting	Parameter
HTML template title field	`contentTemplate.sharePointItem.title=Title` The SharePoint field to use as the HTML template title for generated HTML.
HTML content high search quality fields	`contentTemplate.sharePointItem.quality.high=highField1[,highField2,...]` A comma-separated list of fields to include in the generated HTML as high-quality fields. When the search query terms match these fields, the results are ranked higher.
HTML content medium search quality fields	`contentTemplate.sharePointItem.quality.medium=mediumField1[,mediumField2,...]` A comma-separated list of fields to include in the generated HTML as medium-quality fields.
HTML content low search quality fields	`contentTemplate.sharePointItem.quality.low=lowField1[,lowField2,...]` A comma-separated list of fields to include in the generated HTML as low-quality fields.
HTML content unmapped columns	`contentTemplate.sharepointItem.unmappedColumnsMode=APPEND` How the connector handles unmapped columns. Value is APPEND (default) or IGNORE. APPEND—The connector generates HTML content with all fields, including fields that aren't set with a quality level (high, medium, or low). IGNORE—The connector generates HTML content with only mapped fields.