Deploy the Microsoft SharePoint On-Prem Connector

You can set up Google Cloud Search to return results from your organization's SharePoint on-premises content in addition to your G Suite content. You use the Google Cloud Search SharePoint On-Prem connector and configure it to access a specific SharePoint data source.

Important considerations

Honored SharePoint settings

The Cloud Search SharePoint On-Prem connector always honors the Search Visibility setting on SharePoint, which can't be overridden. For draft documents, the permissions on the user account (that the connector uses to access SharePoint Online) control what draft documents are indexed and returned. If the account has only "Full Read" permissions, the connector honors the "Draft item visibility" settings on SharePoint.

You can also configure the connector to limit results based on user account access. You can use Google principals and external principals to define ACLs. To apply security trimming for SharePoint content, synchronize the following external identities with the Google Directory:

  • Active Directory Users
  • Active Directory Groups
  • SharePoint Local Groups (with Active Directory users and groups as members)

To synchronize AD users and groups, you use Google Cloud Directory Sync, enabling identity mapped groups. To synchronize SharePoint local groups, you use the SharePoint Identity Connector.

The connector also needs to perform lookup with AD to fetch additional information to synchronize the principals. For example, lookup with AD lets the connector do the following:

  • Map the SID for a domain group to the corresponding sAMAccountName.
  • Map a user sAMAccountName to the email address for SharePoint local group memberships.

Search optimization

You can improve your users' experience by configuring the connector to return more relevant search results.

To use the API, set values for HTML generation parameters in the SharePoint Online connector configuration file. These parameters let you set which fields have higher or lower impact on matches.

To set up a schema, follow the instructions in Create and register a schema. When you set up a schema:

  • To map the names of SharePoint content types to corresponding object definitions, the connector normalizes the content type names by excluding unsupported characters. For object definitions, the Cloud Search API supports only A-Z, a-z, and 0-9 as valid characters. For example, the content type "Announcements" maps to the object definition "Announcements". The content type "News Article" maps to "NewsArticle" (no space).

  • When the connector can't match an object definition with an object definition, the connector uses the fallback object type (itemMetadata.objectType). Learn more about metadata configuration parameters.

  • To map SharePoint property names to property definitions, the connector normalizes the property names by decoding hex-encoded characters and removing "ows_" prefixes, then excluding unsupported characters (all characters except A-Z, a-z, and 0-9 as valid characters).

Microsoft Outlook message handling

When the connector encounters Microsoft Outlook .msg files as it indexes content, it overrides the media type for the files and indexes them as application/vnd.ms-outlook.

Multi-tenant configurations

If your SharePoint is a multi-tenant deployment, where multiple customer sites are hosted on the same Web application, you need to configure site collection mode in the configuration file. In multi-tenant deployments, you get permissions only for your site collection and can't get Full Read permissions, as required by the SharePoint On-Prem connector.

To enable site collection mode:

  • Give the connector user account site collection administrator permissions.
  • Set sharepoint.server in your connector configuration file to the site collection URL, such as http://sharepoint.example.com/sites/sitecollection. The URL doesn't need to use the exact same case as on SharePoint.
  • Set sharepoint.siteCollectionOnly in your connector configuration file to true.

If you have multiple site collections to index in a multi-tenant environment, you need to configure one connector instance for each of the site collections.

Known connector limitations

  • The time it takes the connector to detect changes to items in the databases increases with the number of databases the connector monitors.
  • Memory consumption increases with the number of unique users and groups that you use in ACLs for each site collection.
  • You can configure the connector with identities from only one Active Directory Domain.
  • Some common Active Directory and Windows principals, such as Everyone, BUILTIN\Users, and All Authenticated Users, aren't supported.
  • Delete notifications are not instantaneous and it can take more than 4 hours for a connector to recognize that a user deleted content from the source repository.

System requirements

System requirements
Operating system
  • Windows Server 2016
  • Ubuntu
  • Red Hat Enterprise Linux 5.0
  • SUSE Enterprise Linux 10 (64 bit)
Software
  • SharePoint server
    • SharePoint Server 2016
    • SharePoint Server 2013
  • Java JRE 1.8 installed on the computer that will run the Google Cloud Search SharePoint On-Prem connector
Authentication
  • NTLM
  • Kerberos
  • HTTP Basic
  • ADFS

Deploy the connector

Prerequisites

  1. Create a G Suite private key, which contains your service account ID. To learn how to get a private key, go to Configure access to the Google Cloud Search REST API.

  2. Your G Suite administrator must add a data source to search. Record the data source ID.

  3. If the connector returns results based on ACLs (results aren't public), your G Suite administrator must create two identity sources and give you their IDs:

    • An identity source for syncing Active Directory users and groups.
    • An Identity source for SharePoint Local groups

    The admin must also get your organization's G Suite customer ID and give it to you.

    Learn how to get these values in Map user identities in Cloud Search.

  4. Set up a user account for the connector that has Full Read permissions to SharePoint Web Application in the user policy.

  5. If the SharePoint Web Application doesn't have a root site collection, create one.

  6. If any site collections are write-locked, sign in to the SharePoint server with an account that has Admin privileges and run the PrepareWriteLockedSites.ps1 script.

  7. To get data source metrics to inform your connector configuration, sign in to the SharePoint server with an account that has farm administration privileges and run diagnose_sp.ps1.

    The output reports the numbers of web applications, documents, and user group memberships. Use this information to estimate how many connector instances you need, memory requirements, and document count.

Step 1. Install the Google Cloud Search SharePoint On-Prem connector software.

  1. Clone the connector repository from GitHub.

    $ git clone https://github.com/google-cloudsearch/sharepoint-connector.git
    $ cd sharepoint-connector
  2. Check out the desired version of the connector:

    $ git checkout tags/latest_version

    Where: latest_version = a value such as v1-0.0.5

  3. Build the connector.

    $ mvn package

    To skip tests when you build the connector, run mvn package -DskipTests instead of mvn package.

  4. Copy the connector zip file to your local installation directory:

    $ cp target/google-cloudsearch-sharepoint-connector-latest_version.zip installation-dir
    $ cd installation-dir
    $ unzip google-cloudsearch-sharepoint-connector-latest_version.zip
    $ cd google-cloudsearch-sharepoint-connector-latest_version

Step 2. Create the SharePoint On-Prem connector configuration file

  1. In the same directory as the connector installation, create a file. Google recommends that you name the file connector-config.properties so no additional command-line parameters are required to run the connector. If you plan to run many connector instances, add details to the name to distinguish it.

  2. Add parameters as key/value pairs to the file contents, as in the following example:

    ### Sharepoint On-Prem Connector configuration ###
    
    # Required parameters for data source access
    api.sourceId=08ef8becd116faa4546b8ca2c84b2879
    api.serviceAccountPrivateKeyFile=service_account.json
    api.identitySourceId=08ef8becd116faa475de26d9b291fed9
    
    # Required parameters for SharePoint on-premises access
    sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection
    sharepoint.siteCollectionOnly=true
    sharepoint.username=contoso\\admin
    sharepoint.password=pa$sw0rd
    sharepoint.stripDomainInUserPrincipals=true
    
    # Required parameters for AD lookup
    adLookup.host=dc.contoso.com
    adLookup.username=contoso\\admin
    adLookup.password=pa$sw0rd
    api.referenceIdentitySources=CONTOSO,contoso
    api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa
    api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa
    
    # Optional parameters for schema mapping
    contentTemplate.sharepointItem.title=Title
    contentTemplate.sharepointItem.unmappedColumnsMode=APPEND
    

    For detailed descriptions of each parameter, go to the configuration parameters reference.

  3. (Optional) Configure additional connector parameters, as needed. For details, go to Google-supplied connector parameters.

Step 3. For HTTPS, add SharePoint as a trusted host

If SharePoint is configured to use HTTPS, get a SharePoint certificate to add it as a trusted host for the connector.

  1. On the computer that will run the connector, open a browser and go to SharePoint.

  2. In the warning page that opens, click I Understand the Risks and Add Exception. The page shows a message such as "This Connection is Untrusted" because the certificate is self-signed and not signed by a trusted Certificate Authority.

  3. Once the View button is available, click it.

  4. Go to the Details tab and click Export.

  5. Save the certificate in the connector directory with the name sharepoint.crt.

  6. Click Close then Cancel to close the windows.

  7. Open a command prompt and enter the following command:

    $ keytool -importcert -keystore cacerts.jks -storepass changeit -file sharepoint.crt -alias sharepoint

    When prompted "Trust this certificate?", answer yes.

Step 4. Set up logging

  1. In the directory that contains the connector binary, create a folder named logs.

  2. In the same directory (not logs), create a Latin1-encoded file named logging.properties.

  3. Add the following text to logging.properties:

    handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler
    # Default log level
    .level = INFO
    # uncomment line below to increase logging level for SharePoint APIsa
    #com.google.enterprise.cloudsearch.sharepoint.level=FINE
    
    # uncomment line below to increase logging level to enable API trace
    #com.google.api.client.http.level = FINE
    java.util.logging.ConsoleHandler.level = INFO
    java.util.logging.FileHandler.pattern=logs/connector-sharepoint.%g.log
    java.util.logging.FileHandler.limit=10485760
    java.util.logging.FileHandler.count=10
    java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
    

Step 5. Configure the SharePoint On-Prem identity connector

This step is required to apply SharePoint On-Prem identity-based ACLs to search results. If you set up the connector with public ACLs, you can skip this step.

  1. In the same directory as the SharePoint Online connector installation, create a file and name it sharepoint-onprem-identity-connector.config.

  2. Add parameters as key/value pairs to the file contents, as in the following example:

    ### SharePoint On-prem identity connector configuration ###
    
    # Required parameters for data source access
    api.customerId=C05d3djk8
    api.serviceAccountPrivateKeyFile=service_account.json
    api.identitySourceId=08ef8becd116faa475de26d9b291fed9
    
    # Required parameters for SharePoint access
    sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection
    sharepoint.siteCollectionOnly=true
    sharepoint.username=contoso\\admin
    sharepoint.password=pa$sw0rd
    sharepoint.stripDomainInUserPrincipals=true
    
    # Required parameters for AD lookup
    adLookup.host=dc.contoso.com
    adLookup.username=contoso\\admin
    adLookup.password=pa$sw0rd
    api.referenceIdentitySources=CONTOSO,contoso
    api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa
    api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa
    

    The values are almost the same as for the SharePoint On-Prem connector, except that instead of api.sourceId, the parameter is api.customerId. The value of api.customerId is the customer ID that you got from your G Suite admin.

Step 6. Launch the SharePoint On-Prem connector

In the following steps, you map the principals in both the on-premises Active Directory and the SharePoint site collection to identities in the Google Cloud Identity service. This synchronization is done with Google Cloud Directory Sync (GCDS) and the SharePoint On-Prem identity connector.

After GCDS synchronizes users and groups, to synchronize the SharePoint site collection groups, run the SharePoint On-Prem identity connector. Lastly, run the SharePoing On-Prem connector to index and serve results to your Cloud Search users.

  1. If you haven't already, configure and run GCDS. Make sure to enable identity mapped groups.

  2. Run the SharePoint On-Prem identity connector:

    $ java -Djava.util.logging.config.file=logging.properties -cp "google-cloudsearch-sharepoint-connector-version.jar" com.google.enterprise.cloudsearch.sharepoint.SharePointIdentityConnector -Dconfig=sharepoint-onprem-identity-connector.config
  3. Run the SharePoint On-Prem connector. Use the command syntax for your SharePoint site security:

    • HTTP (no trusted host required):

      $ java -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-version.jar
    • HTTPS (add SharePoint as the trusted host):

      $ java -Djavax.net.ssl.trustStore=cacerts.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=changeit -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-version.jar

Configuration parameters reference

Data source access

Setting Parameter
Data source ID api.sourceId=1234567890abcdef

Required. The Google Cloud Search data source ID set up by the G Suite administrator.

Path to the service account private key file api.serviceAccountPrivateKeyFile=PrivateKey.json

Required. The path to the Google Cloud Search service account key file.

SharePoint on-premises access

Setting Parameter
SharePoint server URL sharepoint.server=http://yoursharepoint.example.com/

Required. The URL of the SharePoint server as a fully-qualified host name, such as http://yoursharepoint.example.com/. If the host name is not fully-qualified, then you must set DNS override on the connector host.

SharePoint username sharepoint.username=YOURDOMAIN\\ConnectorUser

Required when you run the connector on Linux or on a windows machine that is not part of the SharePoint Server AD domain.

SharePoint password sharepoint.password=user_password

Required when you run the connector on Linux or on a windows machine that is not part of the SharePoint Server AD domain.

Use Live Authentication to connect to SharePoint sharepoint.username=AdaptorUser Live Authentication Id

sharepoint.password uS3R_passWoRD

sharepoint.formsAuthenticationMode=LIVE

Use ADFS Authentication to connect to SharePoint sharepoint.username=AdaptorUser@yourdomain.com

sharepoint.password=uS3R_passWoRD

sharepoint.sts.endpoint=https://adfs.example.com/adfs/services/trust/2005/usernamemixed

sharepoint.sts.realm=urn:myserver:sharepoint or https://yoursharepoint.example.com/_trust

sharepoint.formsAuthenticationMode=ADFS

Site collection indexing

Setting Parameter
Index type sharepoint.siteCollectionOnly=boolean

Optional, except for multi-tenant SharePoint deployments (learn more). Set to true to have the connector index sharepoint.server as a site collection instead of as a virtual server. Default is null (auto-detected).

SharePoint Identity Mapping

Setting Parameter
Identity Source ID api.identitySourceId=1234567890abcdef

Required. Identity source ID for syncing SharePoint Local Groups.The Google Cloud Search source ID set up by the G Suite administrator, as described in Add a data source to search.

Reference Identity Sources api.referenceIdentitySources=CONTOSO,contoso

A comma-delimited list of reference identity sources for active directory principals. The value matches Active Directory NETBIOS name of the reference active directory principals.

Reference Identity Source IDs api.referenceIdentitySource.DOMAIN.id=identity-source-id

Required. The Identity Source ID for syncing Active Directory principals.

Active Directory Lookup

Setting Parameter
Active Directory Host adLookup.host=host

Required. Active directory hostname, such as dc.contoso.com, or IP address.

Active Directory lookup port adLookup.port=port

Optional. Default is 389. Use 686 for SSL.

Active Directory lookup method adLookup.method=value

Optional. Default is `standard`. For HTTPS connections, set to `ssl`.

Active Directory lookup user adLookup.username=CONTOSO\user1

Required. User authorized to perform active directory lookups.

Active Directory lookup password adLookup.password=password123

Required. Password for user specified by adLookup.user.

HTML content generation

Setting Parameter
HTML template title field contentTemplate.sharePointItem.title=Title

The SharePoint field to use as the HTML template title for generated HTML.

HTML content high search quality fields contentTemplate.sharePointItem.quality.high=highField1[,highField2,...]

A comma-separated list of fields to include in the generated HTML as high-quality fields. When the search query terms match these fields, the results are ranked higher.

HTML content medium search quality fields contentTemplate.sharePointItem.quality.medium=mediumField1[,mediumField2,...]

A comma-separated list of fields to include in the generated HTML as medium-quality fields.

HTML content low search quality fields contentTemplate.sharePointItem.quality.low=lowField1[,lowField2,...]

A comma-separated list of fields to include in the generated HTML as low-quality fields.

HTML content unmapped columns contentTemplate.sharepointItem.unmappedColumnsMode=APPEND

How the connector handles unmapped columns. Value is APPEND (default) or IGNORE.

  • APPEND—The connector generates HTML content with all fields, including fields that aren't set with a quality level (high, medium, or low).
  • IGNORE—The connector generates HTML content with only mapped fields.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2020-02-13.