Deploy the Microsoft Windows File Systems connector

You can set up Google Cloud Search to return results from your organization's Microsoft Windows shares in addition to your Google Workspace content. You use the Google Cloud Search File Systems connector and configure it to access specified Windows shares. A single connector instance can support multiple Microsoft Windows shares.

Important considerations

Continuous automatic updates

By default, the connector continuously monitors start paths (values from fs.src in the connector configuration file) when the connector starts up. When the file system reports changes to content or access controls, the connector is triggered to re-crawl the file system. This re-crawl can be resource intensive. To turn off file system monitoring, set fs.monitorForUpdates to false. You reduce connector's resource use significantly but delay when the connector reflects the changes. Learn more

DFS access control

The DFS system applies access control on its links and usually each DFS link has its own ACL. One mechanism that DFS uses is Access-based Enumeration (ABE), which can restrict the DFS links returned to a user. Users might get only a subset of the DFS Links, or even only one link when ABE isolates hosted home directories. When the connector traverses a DFS system, the connector respects the DFS link ACL and the target's Share ACL, and the Share ACL inherits from the DFS ACL.

Known limitations

  • File System: The File Systems connector doesn't support mapped drives and local drives.
  • Distributed File System: A mapped drive to a UNC DFS doesn't work correctly. Some ACLs aren't read correctly.
  • The File Systems connector supports Distributed File System (DFS) namespaces and links. However, the connector supports DFS links only in a DFS namespace, not the regular folders in the DFS namespace.
  • File links returned in cloudsearch.google.com aren't clickable. The file links returned by the Query API aren't clickable in most browsers, either.

System requirements

System requirements
Operating system
  • Windows Server 2016
  • Windows Server 2012
  • Windows Server 2008 R2
Software
  • Java JRE 1.8 installed on the computer that will run the Google Cloud Search File Systems connector
File system protocols
  • Server Message Block (SMB) - SMB1
  • Server Message Block (SMB) - SMB2
  • Distributed File System (DFS)

Not supported: Local Windows file systems, Sun Network File System (NFS) 2.0, Sun Network File System (NFS) 3.0, or Local Linux file system.

Deploy the connector

Prerequisites

Before you deploy the Cloud Search File Systems connector, ensure that your environment has all the following prerequisite components:

Required Microsoft Windows account permissions

The Microsoft Windows account that the connector is running under must have sufficient permissions to perform the following actions:

  • List the content of folders
  • Read the content of documents
  • Read attributes of files and folders
  • Read permissions (ACLs) for both files and folders
  • Write basic attributes permissions

Membership in one of the following groups grants a Windows account the sufficient permissions needed by the connector:

  • Administrators
  • Power Users
  • Print Operators
  • Server Operators

Step 1. Install the Google Cloud Search File Systems connector

  1. Get the connector repository from GitHub and build it.

    To use git on the Windows server:

    1. Clone the repository:

      > git clone https://github.com/google-cloudsearch/windows-filesystems-connector.git
      > cd windows-filesystems-connector
    2. Check out the desired version of the connector:

      > git checkout tags/v1-0.0.3

    To download from GitHub directly:

    1. Go to https://github.com/google-cloudsearch/windows-filesystems-connector.
    2. Click Clone or download Download zip.
    3. Unzip the package.
    4. Move to the new directory:
      > cd windows-filesystems-connector
  2. Build the connector. If necessary, install Apache Maven.

    > mvn package

    To skip tests when you build the connector, run mvn package -DskipTests instead of mvn package.

  3. Copy the connector zip file to your local installation directory:

    > cp target/google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip installation-dir
    > cd installation-dir
    > unzip google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip
    > cd google-cloudsearch-windows-filesystems-connector-v1-0.0.3

Step 2. Create the connector configuration file

  1. In the same directory as the connector installation, create a file and name it connector-config.properties.

  2. Add parameters as key/value pairs to the file contents, as in the following example:

    ### File system connector configuration ###
    
    # Required parameters for Cloud Search data source and identity source access
    api.serviceAccountPrivateKeyFile=/path/to/file.json
    api.sourceId=0123456789abcde
    api.identitySourceId=a1b1c1234567
    
    # Required parameters for file system access
    fs.src=\\\\host\\share;\\\\dfshost\\dfsnamespace;\\\\dfshost\\dfsnamespace\\link
    
    # Optional parameters for file system monitoring
    traverse.abortAfterExceptions=500
    fs.monitorForUpdates = true
    fs.preserveLastAccessTime = IF_ALLOWED
    

    For detailed descriptions of each parameter, go to the configuration parameters reference.

  3. (Optional) Configure other connector parameters, as needed. For details, go to Google-supplied connector parameters.

Step 3. Enable logging

  1. Create a folder named logs in the directory that contains the connector binary.
  2. Create an ASCII or UTF-8 file named logging.properties in the directory that contains the connector binary and add the following content:

    handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler
    # Default log level
    .level = WARNING
    com.google.enterprise.cloudsearch.level = INFO
    com.google.enterprise.cloudsearch.fs.level = INFO
    
    # uncomment line below to increase logging level to enable API trace
    #com.google.api.client.http.level = FINE
    java.util.logging.ConsoleHandler.level = INFO
    java.util.logging.FileHandler.pattern=logs/connector-fs.%g.log
    java.util.logging.FileHandler.limit=10485760
    java.util.logging.FileHandler.count=10
    java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
    

Step 4. (Optional) Configure media types

By default, the connector tries to detect the media type for each file with JDK-provided media type detection. On Microsoft Windows, JDK relies on Windows registry to determine media types for files. A missing registry entry can result in a null media type for certain files.

If necessary, you can specify a media type that overwrites any existing bindings or prevents a null media type.

  1. In the connector directory, create a Latin-1-encrypted file named mime-type.properties.
  2. Enter file extensions and their corresponding media types as in the following examples:

    xlsx=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
    one=application/msonenote
    txt=text/plain
    pdf=application/pdf
    

Step 5: Run the File Systems connector

After you install and configure the File Systems connector, to launch it on the host machine, run a command like the following example:

> java -jar google-cloudsearch-windows-filesystems-connector-v1-0.0.3.jar -Djava.util.logging.config.file=logging.properties[ -Dconfig=my.config]

Specify the configuration file path if it's different from the default (in the same directory as the binary with the name connector-config.properties).

Configuration parameters reference

Data source access

Setting Parameter
Data source ID api.sourceId=1234567890abcdef

Required. The Google Cloud Search source ID set up by the Google Workspace administrator.

Path to the service account private key file api.serviceAccountPrivateKeyFile=./PrivateKey.json

Required. The Google Cloud Search service account key file for Google Cloud Search File Systems connector accessibility.

Identity source ID api.identitySourceId=x0987654321

Required. The Cloud Search identity source ID set up by the Google Workspace administrator for syncing active directory identities using GCDS.

File system access

Setting Parameter
Source file systems fs.src=path1[,path2, ...]

Required. Specify source file systems as one or more UNC sources that are separated by the delimiter configured by fs.src.separator. If you use characters not in Latin1, encode them with Java Unicode escapes.

Path separator character

Setting Parameter
Path separator character fs.src.separator=separator-character

The default separator is ";". If your source paths contain semicolons, you can set a different delimiter, such as a comma (","), that does not conflict with characters in your paths and isn't reserved by property file syntax itself.

If the fs.src.separator value is an empty string, then the fs.src value is treated as a single path.

Connector behavior

Setting Parameter
Windows domain fs.supportedDomain=domain

Required to let users who are set up with GCDS access documents through Cloud Search. Specify as a single NetBIOS domain name of the Active Directory.

Include accounts in ACLS fs.supportedAccounts=account-1[, account-2,...]

A comma-delimited list of accounts to include in ACLs regardless of whether they are built-in accounts.

The default value is BUILTIN\\Administrators,Everyone,BUILTIN\\Users, BUILTIN\\Guest,NT AUTHORITY\\INTERACTIVE, NT AUTHORITY\\Authenticated Users

Exclude built-in accounts from ACLs fs.builtinGroupPrefix=prefix

Specify the prefix of built-in accounts. An account that starts with this prefix is considered a built-in account and will be excluded from the ACLs.

The default value is BUILTIN\\

Allow indexing of hidden files and folders fs.crawlHiddenFiles=boolean

Set to true to allow the connector to crawl hidden files and folders (on Windows file systems, a file or folder is considered hidden if the DOS hidden attribute is set.) The default value is false.

Allow indexing of crawled folder listings and DFS Namespace enumerations fs.indexFolders=boolean

When set to true (default), when the connector crawls a folder, the connector creates a CONTAINER_ITEM object. When set to false, the connector creates a VIRTUAL_CONTAINER_ITEM object instead.

Enable file system change monitoring fs.monitorForUpdates=boolean

When set to true (default), changes to content or access controls trigger the connector to re-crawl. When you turn off monitoring (set to false), you reduce connector's resource use significantly but delay when the connector reflects the changes.

Set the maximum size of the cache of directories fs.directoryCacheSize=number-of-entries

The maximum size of the directory cache. The connector uses the cache to identify hidden folders to avoid indexing files and folders in hidden folders.

The default is 50,000 entries, which typically consume 10–15 megabytes of RAM.

Timestamp preservation and crawl control

Setting Parameter
Preserve last-access timestamp fs.preserveLastAccessTime=value

When the connector crawls files and folders, the connector can change the last access timestamp of the files and folders to the time of the crawl. When last access times aren't preserved, backup and archive systems might not move appropriate files and folders to secondary storage because of the connector's visit.

By default, the connector attempts to preserve the last access time (fs.preserveLastAccessTime set to ALWAYS). The connector might be unable to restore the last access time for the file when the traversal user doesn't have sufficient privileges to write file attributes. When set to ALWAYS, the connector rejects crawl requests for the file system so that it doesn't alter the last access timestamps of the files.

Possible values:

  • ALWAYS: The connector attempts to preserve the last access time as it crawls files and folders. The first time the connector can't preserve the last access time, the connector rejects all subsequent crawl requests for the file system to prevent altering the last access timestamps.
  • IF_ALLOWED: The connector attempts to preserve the last access time as it crawls files and folders. It continues to crawl even when some timestamps might not be preserved.
  • NEVER: The connector doesn't attempt to preserve the last access time as it crawls files and folders.
Crawl only files that were accessed after a certain date fs.lastAccessedDate=YYYY-MM-DD

Crawl content only if the last access time is after the specified date. The default value is disabled.

Specify the date in ISO8601 date format: YYYY-MM-DD. For example, if the value is 2010-01-01, the connector only crawls content that was accessed after the beginning of 2010.

If you specify fs.lastAccessedDate, you can't also set a value for fs.lastAccessedDays.

Crawl only files that were accessed within the past number of days fs.lastAccessedDays=number-of-days

Crawl content only if the last access time is within the number of days before present. The default value is disabled.

Use this property to expire previously indexed content that has not been accessed in a while. For example, set to 365 to crawl content only if it was accessed in the last year.

If you specify fs.lastAccessedDays, you can't also set a value for fs.lastAccessedDate.

Crawl only files that were modified after a certain date fs.lastModifiedDate=YYYY-MM-DD

Crawl content only if the last modified time is after the specified date. The default value is disabled.

Specify the date in ISO8601 date format: YYYY-MM-DD. For example, if the value is 2010-01-01, the connector only crawls content that was modified after the beginning of 2010.

If you specify fs.lastModifiedDate, you can't also set a value for fs.lastModifiedDays.

Crawl only files that were modified within the past number of days fs.lastModifiedDays=number-of-days

Crawl content only if the last modification time is within the number of days before present. The default value is disabled.

Use this property to expire previously indexed content that has not been modified in a while. For example, set to 365 to crawl content only if it was modified in the last year.

If you specify fs.lastModifiedDays, you can't also set a value for fs.lastModifiedDate.

Skip file share access control

By default, the connector preserves access control integrity when it sends Access Control Lists (ACLs) to the indexing API, including the ACLs on the file share. In some configurations, however, the connector might not have sufficient permissions to read the share ACL. In those instances, the connector doesn't return any files maintained on that file share in search results.

You can set the connector to ignore the share ACL so that content is always returned in search results. In this case, the indexing API gets a completely permissive share ACL, rather than the actual share ACL.

Setting Parameter
Skip file share access control fs.skipShareAccessControl=boolean

Set to false (default) to enforce share ACLs. Set to true to ignore the share ACLs.