Deploy the Microsoft Windows File Systems connector

You can set up Google Cloud Search to return results from your organization's Microsoft Windows shares in addition to your Google Workspace content. You use the Google Cloud Search File Systems connector and configure it to access specified Windows shares. A single connector instance can support multiple Microsoft Windows shares.

Important considerations

Continuous automatic updates

By default, the connector continuously monitors start paths (values from fs.src in the connector configuration file) when the connector starts up. When the file system reports changes to content or access controls, the connector is triggered to re-crawl the file system. This re-crawl can be resource intensive. To turn off file system monitoring, set fs.monitorForUpdates to false. You reduce connector's resource use significantly but delay when the connector reflects the changes. Learn more

DFS access control

The DFS system applies access control on its links and usually each DFS link has its own ACL. One mechanism that DFS uses is Access-based Enumeration (ABE), which can restrict the DFS links returned to a user. Users might get only a subset of the DFS Links, or even only one link when ABE isolates hosted home directories. When the connector traverses a DFS system, the connector respects the DFS link ACL and the target's Share ACL, and the Share ACL inherits from the DFS ACL.

Known limitations

File System: The File Systems connector doesn't support mapped drives and local drives.
Distributed File System: A mapped drive to a UNC DFS doesn't work correctly. Some ACLs aren't read correctly.
The File Systems connector supports Distributed File System (DFS) namespaces and links. However, the connector supports DFS links only in a DFS namespace, not the regular folders in the DFS namespace.
File links returned in cloudsearch.google.com aren't clickable. The file links returned by the Query API aren't clickable in most browsers, either.

System requirements

System requirements
Operating system	Windows Server 2016 Windows Server 2012 Windows Server 2008 R2
Software	Java JRE 1.8 installed on the computer that will run the Google Cloud Search File Systems connector
File system protocols	Server Message Block (SMB) - SMB1 Server Message Block (SMB) - SMB2 Distributed File System (DFS) Not supported: Local Windows file systems, Sun Network File System (NFS) 2.0, Sun Network File System (NFS) 3.0, or Local Linux file system.

Operating system

Windows Server 2016
Windows Server 2012
Windows Server 2008 R2

Software

Java JRE 1.8 installed on the computer that will run the Google Cloud Search File Systems connector

File system protocols

Server Message Block (SMB) - SMB1
Server Message Block (SMB) - SMB2
Distributed File System (DFS)

Not supported: Local Windows file systems, Sun Network File System (NFS) 2.0, Sun Network File System (NFS) 3.0, or Local Linux file system.

Deploy the connector

Prerequisites

Before you deploy the Cloud Search File Systems connector, ensure that your environment has all the following prerequisite components:

Google Workspace information required to establish relationships between Google Cloud Search and the data source:

Google Workspace private key (which contains the service account ID). For information on obtaining a private key, go to Configure access to the Google Cloud Search REST API.

Google Workspace data source ID. For information on obtaining a data source ID, go to Add a data source to search.

An identity source ID. For information about how to get an identity source ID, go to Create an identity source. If you sync your Google Workspace directory with Active Directory, set up the identity source with GCDS.

The Google Workspace admin for your organization can usually get you these credentials.
Ensure that the Windows account has sufficient permissions, as described in the following section.

Required Microsoft Windows account permissions

The Microsoft Windows account that the connector is running under must have sufficient permissions to perform the following actions:

List the content of folders
Read the content of documents
Read attributes of files and folders
Read permissions (ACLs) for both files and folders
Write basic attributes permissions

Membership in one of the following groups grants a Windows account the sufficient permissions needed by the connector:

Administrators
Power Users
Print Operators
Server Operators

Step 1. Install the Google Cloud Search File Systems connector

Get the connector repository from GitHub and build it.

To use git on the Windows server:
1. Clone the repository:
```
> git clone https://github.com/google-cloudsearch/windows-filesystems-connector.git
> cd windows-filesystems-connector
```
2. Check out the desired version of the connector:
```
> git checkout tags/v1-0.0.3
```
To download from GitHub directly:
1. Go to https://github.com/google-cloudsearch/windows-filesystems-connector.
2. Click Clone or download Download zip.
3. Unzip the package.
4. Move to the new directory:
```
> cd windows-filesystems-connector
```
Build the connector. If necessary, install Apache Maven.
```
> mvn package
```
To skip tests when you build the connector, run mvn package -DskipTests instead of mvn package.

Copy the connector zip file to your local installation directory:

> cp target/google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip installation-dir
> cd installation-dir
> unzip google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip
> cd google-cloudsearch-windows-filesystems-connector-v1-0.0.3

Step 2. Create the connector configuration file

In the same directory as the connector installation, create a file and name it connector-config.properties.

Add parameters as key/value pairs to the file contents, as in the following example:

### File system connector configuration ###

# Required parameters for Cloud Search data source and identity source access
api.serviceAccountPrivateKeyFile=/path/to/file.json
api.sourceId=0123456789abcde
api.identitySourceId=a1b1c1234567

# Required parameters for file system access
fs.src=\\\\host\\share;\\\\dfshost\\dfsnamespace;\\\\dfshost\\dfsnamespace\\link

# Optional parameters for file system monitoring
traverse.abortAfterExceptions=500
fs.monitorForUpdates = true
fs.preserveLastAccessTime = IF_ALLOWED

For detailed descriptions of each parameter, go to the configuration parameters reference.

(Optional) Configure other connector parameters, as needed. For details, go to Google-supplied connector parameters.

Step 3. Enable logging

Create a folder named logs in the directory that contains the connector binary.

Create an ASCII or UTF-8 file named logging.properties in the directory that contains the connector binary and add the following content:

handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler
# Default log level
.level = WARNING
com.google.enterprise.cloudsearch.level = INFO
com.google.enterprise.cloudsearch.fs.level = INFO

# uncomment line below to increase logging level to enable API trace
#com.google.api.client.http.level = FINE
java.util.logging.ConsoleHandler.level = INFO
java.util.logging.FileHandler.pattern=logs/connector-fs.%g.log
java.util.logging.FileHandler.limit=10485760
java.util.logging.FileHandler.count=10
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter

Step 4. (Optional) Configure media types

By default, the connector tries to detect the media type for each file with JDK-provided media type detection. On Microsoft Windows, JDK relies on Windows registry to determine media types for files. A missing registry entry can result in a null media type for certain files.

If necessary, you can specify a media type that overwrites any existing bindings or prevents a null media type.

In the connector directory, create a Latin-1-encrypted file named mime-type.properties.

Enter file extensions and their corresponding media types as in the following examples:

xlsx=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
one=application/msonenote
txt=text/plain
pdf=application/pdf

Step 5: Run the File Systems connector

After you install and configure the File Systems connector, to launch it on the host machine, run a command like the following example:

> java -jar google-cloudsearch-windows-filesystems-connector-v1-0.0.3.jar -Djava.util.logging.config.file=logging.properties[ -Dconfig=my.config]

Specify the configuration file path if it's different from the default (in the same directory as the binary with the name connector-config.properties).

Configuration parameters reference

Data source access

Setting	Parameter
Data source ID	`api.sourceId=1234567890abcdef` Required. The Google Cloud Search source ID set up by the Google Workspace administrator.
Path to the service account private key file	`api.serviceAccountPrivateKeyFile=./PrivateKey.json` Required. The Google Cloud Search service account key file for Google Cloud Search File Systems connector accessibility.
Identity source ID	`api.identitySourceId=x0987654321` Required. The Cloud Search identity source ID set up by the Google Workspace administrator for syncing active directory identities using GCDS.

File system access

Setting	Parameter
Source file systems	`fs.src=path1[,path2, ...]` Required. Specify source file systems as one or more UNC sources that are separated by the delimiter configured by `fs.src.separator`. If you use characters not in Latin1, encode them with Java Unicode escapes.

Path separator character

Setting Parameter

Path separator character

fs.src.separator=separator-character

The default separator is ";". If your source paths contain semicolons, you can set a different delimiter, such as a comma (","), that does not conflict with characters in your paths and isn't reserved by property file syntax itself.

If the fs.src.separator value is an empty string, then the fs.src value is treated as a single path.

Connector behavior

Setting	Parameter
Windows domain	`fs.supportedDomain=domain` Required to let users who are set up with GCDS access documents through Cloud Search. Specify as a single NetBIOS domain name of the Active Directory.
Include accounts in ACLS	`fs.supportedAccounts=account-1[, account-2,...]` A comma-delimited list of accounts to include in ACLs regardless of whether they are built-in accounts. The default value is `BUILTIN\\Administrators,Everyone,BUILTIN\\Users, BUILTIN\\Guest,NT AUTHORITY\\INTERACTIVE, NT AUTHORITY\\Authenticated Users`
Exclude built-in accounts from ACLs	`fs.builtinGroupPrefix=prefix` Specify the prefix of built-in accounts. An account that starts with this prefix is considered a built-in account and will be excluded from the ACLs. The default value is `BUILTIN\\`
Allow indexing of hidden files and folders	`fs.crawlHiddenFiles=boolean` Set to `true` to allow the connector to crawl hidden files and folders (on Windows file systems, a file or folder is considered hidden if the DOS hidden attribute is set.) The default value is `false`.
Allow indexing of crawled folder listings and DFS Namespace enumerations	`fs.indexFolders=boolean` When set to `true` (default), when the connector crawls a folder, the connector creates a CONTAINER_ITEM object. When set to false, the connector creates a VIRTUAL_CONTAINER_ITEM object instead.
Enable file system change monitoring	`fs.monitorForUpdates=boolean` When set to `true` (default), changes to content or access controls trigger the connector to re-crawl. When you turn off monitoring (set to `false`), you reduce connector's resource use significantly but delay when the connector reflects the changes.
Set the maximum size of the cache of directories	`fs.directoryCacheSize=number-of-entries` The maximum size of the directory cache. The connector uses the cache to identify hidden folders to avoid indexing files and folders in hidden folders. The default is 50,000 entries, which typically consume 10–15 megabytes of RAM.

Timestamp preservation and crawl control

Setting	Parameter
Preserve last-access timestamp	`fs.preserveLastAccessTime=value` When the connector crawls files and folders, the connector can change the last access timestamp of the files and folders to the time of the crawl. When last access times aren't preserved, backup and archive systems might not move appropriate files and folders to secondary storage because of the connector's visit. By default, the connector attempts to preserve the last access time (`fs.preserveLastAccessTime` set to `ALWAYS`). The connector might be unable to restore the last access time for the file when the traversal user doesn't have sufficient privileges to write file attributes. When set to `ALWAYS`, the connector rejects crawl requests for the file system so that it doesn't alter the last access timestamps of the files. Possible values: `ALWAYS`: The connector attempts to preserve the last access time as it crawls files and folders. The first time the connector can't preserve the last access time, the connector rejects all subsequent crawl requests for the file system to prevent altering the last access timestamps. `IF_ALLOWED`: The connector attempts to preserve the last access time as it crawls files and folders. It continues to crawl even when some timestamps might not be preserved. `NEVER`: The connector doesn't attempt to preserve the last access time as it crawls files and folders.
Crawl only files that were accessed after a certain date	`fs.lastAccessedDate=YYYY-MM-DD` Crawl content only if the last access time is after the specified date. The default value is `disabled`. Specify the date in ISO8601 date format: YYYY-MM-DD. For example, if the value is 2010-01-01, the connector only crawls content that was accessed after the beginning of 2010. If you specify `fs.lastAccessedDate`, you can't also set a value for `fs.lastAccessedDays`.
Crawl only files that were accessed within the past number of days	`fs.lastAccessedDays=number-of-days` Crawl content only if the last access time is within the number of days before present. The default value is `disabled`. Use this property to expire previously indexed content that has not been accessed in a while. For example, set to 365 to crawl content only if it was accessed in the last year. If you specify `fs.lastAccessedDays`, you can't also set a value for `fs.lastAccessedDate`.
Crawl only files that were modified after a certain date	`fs.lastModifiedDate=YYYY-MM-DD` Crawl content only if the last modified time is after the specified date. The default value is `disabled`. Specify the date in ISO8601 date format: YYYY-MM-DD. For example, if the value is 2010-01-01, the connector only crawls content that was modified after the beginning of 2010. If you specify `fs.lastModifiedDate`, you can't also set a value for `fs.lastModifiedDays`.
Crawl only files that were modified within the past number of days	`fs.lastModifiedDays=number-of-days` Crawl content only if the last modification time is within the number of days before present. The default value is `disabled`. Use this property to expire previously indexed content that has not been modified in a while. For example, set to 365 to crawl content only if it was modified in the last year. If you specify `fs.lastModifiedDays`, you can't also set a value for `fs.lastModifiedDate`.

Skip file share access control

By default, the connector preserves access control integrity when it sends Access Control Lists (ACLs) to the indexing API, including the ACLs on the file share. In some configurations, however, the connector might not have sufficient permissions to read the share ACL. In those instances, the connector doesn't return any files maintained on that file share in search results.

You can set the connector to ignore the share ACL so that content is always returned in search results. In this case, the indexing API gets a completely permissive share ACL, rather than the actual share ACL.

Setting	Parameter
Skip file share access control	`fs.skipShareAccessControl=boolean` Set to `false` (default) to enforce share ACLs. Set to `true` to ignore the share ACLs.