Deploy the Microsoft Windows File Systems connector

This guide is intended for Google Cloud Search File Systems connector administrators, that is, anyone who is responsible for downloading, configuring, running, and monitoring the connector. This guide includes instructions for performing key tasks related to Microsoft Windows File Systems connector deployment:

  • Download the Google Cloud Search File Systems connector software
  • Configure the connector for use with a specific file system data source
  • Deploy and run the connector

To understand the concepts in this document, you should be familiar with the fundamentals of G Suite and Microsoft Windows file system.

Overview of the Google Cloud Search File Systems connector

The Google Cloud Search File Systems connector enables the Google Cloud Search to discover content from Microsoft Windows shares and index it into Cloud Search via Cloud Search's Indexing API. Once successfully indexed, content from the Microsoft Windows share is searchable through Cloud Search's clients or Cloud Search's Query API.

A single connector instance can support multiple Microsoft Windows shares. DFS namespaces and links are supported by the connector. However, the connector only supports DFS links in a DFS namespace, not the regular folders in the DFS namespace.

Configuration properties files

To enable the connector to discover content in a file system and upload it to the indexing API, you, as the connector administrator,must create a configuration file to provide settings to the Windows File Systems connector using the steps described in Deployment Steps.

In addition to the File Systems connector parameters described in this document, there are configuration parameters used by all Cloud Search connectors. For detailed information, see Google-supplied connector parameters.

Microsoft Windows account permissions needed by the connector

The Microsoft Windows account that the connector is running under must have sufficient permissions to perform the following actions:

  • List the content of folders
  • Read the content of documents
  • Read attributes of files and folders
  • Read permissions (ACLs) for both files and folders
  • Write basic attributes permissions

The connector attempts to restore the last access date for documents after it reads the document content during a crawl. For the last access date to be restored back to the original value before the content was read, the user account that the connector is running under needs to have write permission for documents. If the account has read-only permission and not write permission, then the last access date for documents will change as the connector reads document content during a crawl.

Membership in one of the following groups grants a Windows account the sufficient permissions needed by the connector:

  • Administrators
  • Power Users
  • Print Operators
  • Server Operators

Continuous automatic updates

By default, the connector starts monitoring start paths (values from fs.src) that are file shares or DFS links on startup. If a start path is a DFS namespace, the connector starts a monitor for each link within the namespace. If the monitors aren't started at startup, they're started when a start path or DFS link that's already in the Cloud Search index is returned to the connector as the result of a poll request. You can turn this feature off/on by setting the value in the connector configuration option fs.monitorForUpdates, as described in connector-config.properties variables.

DFS access control

The DFS system employs access control when navigating its links, and usually each DFS link has its own ACL. One of the mechanisms employed by this is Access-based Enumeration (ABE). With ABE deployed, users may only see a subset of the DFS Links, possibly only one when ABE is used to isolate hosted home directories. When traversing a DFS system, the connector supplies the DFS Link ACL, in addition to the target's Share ACL as a named resource when the DFS Link is crawled. In this case, the Share ACL inherits from the DFS ACL.

Supported operating systems

The Cloud Search File Systems connector must be installed on one of the following supported Windows operating systems

  • Windows Server 2016
  • Windows Server 2012
  • Windows Server 2008 R2

The Cloud Search File Systems connector does not run on Linux.

Supported file system protocols

The following table lists file system protocols used to communicate with file shares and indicates if the connector supports them.

File System Protocol Communicating with Shares on Operating System Supported ?
Server Message Block (SMB) - SMB1 Windows Server 2016
Windows Server 2012
Windows Server 2008 R2
Yes
Server Message Block (SMB) - SMB2 Windows Server 2016
Windows Server 2012
Windows Server 2008 R2
Yes
Distributed File System (DFS) Windows Server 2016
Windows Server 2012
Windows Server 2008 R2
Yes
Local Windows file system Windows Server 2016
Windows Server 2012
Windows Server 2008 R2
No
Sun Network File System (NFS) 2.0 No
Sun Network File System (NFS) 3.0 No
Local Linux file system No

Known limitations

  • File System: This release of the File Systems connector does not support mapped drives and local drives.
  • Distributed File System: A mapped drive to a UNC DFS does not work correctly. Some ACLs will not be read correctly.

Prerequisites

Before you deploy the Cloud Search File Systems connector, ensure that your environment has all of the following prerequisite components:

  • Windows Server 2016
  • Java JRE 1.8 installed on the computer that runs the connector.
  • G Suite information required to establish relationships between Google Cloud Search and the data source:

    Typically, the G Suite administrator for the domain can supply these credentials for you.

  • Ensure that the Windows account has sufficient permissions, as described in the following section.

  • When sharing a folder from a Windows platform, permission can be given at the share ACL and the NTFS ACL of the folder. Both ACLs need to give the connector appropriate access. Both ACLs are also read by the connector. The administrator may skip the attempt to read the share ACL by setting the fs.skipShareAccessControl configuration option to true.

Deployment steps

To deploy the Google Cloud Search File Systems connector, follow these steps:

  1. Install the Cloud Search File Systems connector.
  2. Specify the File Systems connector configuration.
  3. Configure access to the Google Cloud Search data source.
  4. Configure access to file systems.
  5. Configure the path separator character.
  6. Configure connector behavioral controls.
  7. Configure last access controls.
  8. Restrict access to crawled documents and folders.
  9. Skip file share access control.
  10. Enable logging.
  11. Configure mime-type.properties.

1. Install the Cloud Search File Systems connector

Google provides the installation software for the connector in the following file:

google-cloudsearch-filesystem-connector-v1-0.0.2.zip

Download and extract the Windows File Systems connector and save it to a local working directory where the connector runs. This directory can also contain all the relevant files required for execution, including the configuration file, service account key file.

2. Specify the File Systems connector configuration

For the connector to properly access a file system and index the relevant content, you must first create its configuration file. You control the File Systems connector's behavior and attributes by defining parameters in the connector's configuration file. Configurable parameters control:

To create a configuration file:

  1. Open a text editor of your choice and name the configuration file. Add key=value pairs to the file contents as described in the following sections.
  2. Save and name the configuration file. Google recommends that you name the configuration file connector-config.properties so no additional command line parameters are required to run connector.

3. Configure access to the Google Cloud Search data source

The first parameters every configuration file must specify are the ones necessary to access the Cloud Search data source, as shown in the following table. Typically, you will need the Data source ID, service account ID, and the path to the service account's private key file in order to configure the connector's access to Cloud Search. The steps required to set up a data source are described in Add a data source to search.

Setting Parameter
Data source ID api.sourceId=1234567890abcdef
Required. The Google Cloud Search source ID set up by the G Suite administrator.
Path to the service account private key file api.serviceAccountPrivateKeyFile=./PrivateKey.json
Required. The Google Cloud Search service account key file for Google Cloud Search SharePoint On-Prem connector accessibility.
Identity source ID api.identitySourceId=x0987654321
Required. The Cloud Search identity source ID set up by the G Suite administrator for syncing active directory identities using GCDS.

4. Configure access to file systems

Before the connector can access a file system and extract data from it for indexing, you must configure access to the source file system. Use the following parameter to add access information to the configuration file.

Setting Parameter
Source file systems fs.src=filename1,filename2
Multiple source file systems may be specified for the fs.src property by supplying a list of UNC sources, separated by the delimiter configured by fs.src.separator. UNICODE, as well as non-ASCII, characters can be used in fs.src. Including these characters will require the connector configuration file to be saved using UTF-8 encoding.

5. Configure the path separator character

Use the following parameter to add separator information to the configuration file.

Setting Parameter
Path separator character fs.src.separator=,
The default separator is ";" (similar to how one would set the PATH or CLASS_PATH environment variable). However, if your specified source paths contain semicolons, you can configure a different delimiter that does not conflict with characters in your paths, and is not reserved by property file syntax itself.

If the fs.src.separator is set to the empty string, then the fs.src value is considered to be a single pathname.

6. Configure connector behavioral controls

Use the following parameter to add information about connector behavior to the configuration file.

Setting Parameter
Include accounts in ACLS fs.supportedAccounts=BUILTIN\\Administrators,\\Everyone,BUILTIN\\Users

Accounts that are in the supportedAccounts will be included in ACLs regardless if they are builtin or not.

The default value is BUILTIN\\Administrators,Everyone,BUILTIN\\Users, \ BUILTIN\\Guest,NT AUTHORITY\\INTERACTIVE, \ NT AUTHORITY\\Authenticated Users

Exclude builtin accounts from ACLs fs.builtinGroupPrefix=BUILTIN\\

Builtin accounts are excluded from the ACLs that are pushed to the indexing API. An account that starts with this prefix is considered a builtin account and will be excluded from the ACLs.

The default value is BUILTIN\\

Allow or disallow indexing of hidden files and folders fs.crawlHiddenFiles=true

The definition of hidden files and folders is platform dependent. On Windows file systems, a file or folder is considered hidden if the DOS hidden attribute is set. By default, hidden files are not indexed and the contents of hidden folders are not indexed. Setting fs.crawlHiddenFiles to true will allow hidden files and folders to be crawled by the connector.

The default value is false

Allow or disallow indexing of crawled folder listings and DFS Namespace enumerations fs.indexFolders=false

When a folder is crawled, the connector creates a CONTAINER_ITEM object. With indexFolders set to false, the connector creates a VIRTUAL_CONTAINER_ITEM object instead.

The default value is true

Enable/disable filesystem change monitoring fs.monitorForUpdates=false

When monitoring is disabled updates/changes to content or access-controls are not immediately sent to the indexing API with request to re-crawl. Turning off monitoring reduces connector's resource use significantly.

The default value is true

Set the maximum size of the cache of directories fs.directoryCacheSize=25000

Sets the maximum size of the cache of directories encountered. This cache is currently used to identify which folders are hidden or not hidden to avoid indexing files and folders whose ancestor is hidden. A folder is considered hidden if the DOS hidden attribute is set.

The default maximum cache size is 50,000 entries, which would typically consume 10-15 megabytes of RAM.

7. Configure last access controls

Use the following parameter to add information about last access to crawled files and folders to the configuration file.

Setting Parameter
Preserve last access timestamp fs.preserveLastAccessTime=NEVER

This configuration property controls the level of enforcement of the preservation of the last access timestamp of crawled files and folders. Failure to preserve last access times can fool backup and archive systems into thinking the file or folder has been recently accessed by a human, preventing the movement of least recently used items to secondary storage.

If the connector is unable to restore the last access time for the file, it is likely the traversal user does not have sufficient privileges to write the file's attributes. As a precaution, the connector rejects crawl requests for the file system to prevent altering the last access timestamps for potentially thousands of files.

The fs.preserveLastAccessTime property has three possible values:

  • ALWAYS: The connector will attempt to preserve the last access time for all files and folders crawled. The first failure to do so will force the connector to reject all subsequent crawl requests for the file system to prevent altering the last access timestamps for potentially thousands of files.
  • IF_ALLOWED: The connector will attempt to preserve the last access time for all files and folders crawled, even though some timestamps might not be preserved.
  • NEVER: The connector will make no attempt to preserve the last access time for crawled files and folders. The default level of enforcement for preservation of last access timestamps is ALWAYS
Disable crawling of files whose time of last access is earlier than a specific date fs.lastAccessedDate=2010-01-01

The cut-off date is specified in ISO8601 date format, YYYY-MM-DD.

Setting fs.lastAccessedDate to 2010-01-01 would only crawl content that has been accessed since the beginning of 2010. Only one of fs.lastAccessedDate or fs.lastAccessedDays may be specified. The default value is disabled

Disable crawling of files that have not been accessed within the specified number of days fs.lastAccessedDays=365

Unlike the absolute cut-off date used by fs.lastAccessedDate, this property can be used to expire previously indexed content if it has not been accessed in a while. The expiration window is specified as a positive integer for number of days. Setting fs.lastAccessedDays to 365 would only crawl content that has been accessed in the last year. Only one of fs.lastAccessedDate or fs.lastAccessedDays may be specified. The default value is disabled

8. Restrict access to crawled documents and folders

Use the following parameters to add information about restricting access to crawled files and folders to the configuration file.

Setting Parameter
Disable crawling of files whose time of last access is earlier than a specific date fs.lastModifiedDate=2010-01-01

The cut-off date is specified in ISO 8601 date format, YYYY-MM-DD.

Setting fs.lastModifiedDate to 2010-01-01 would only crawl content that has been modified since the beginning of 2010. Only one of fs.lastModifiedDate or fs.lastModifiedDays may be specified. The default value is disabled

Disable crawling of files that have not been modified within the specified number of days. fs.lastModifiedDays=365

Unlike the absolute cut-off date used by fs.lastModifiedDate, this property can be used to expire previously indexed content if it has not been modified in a while. The expiration window is specified as a positive integer for number of days. Setting fs.lastModifiedDays to 365 would only crawl content that has been modified in the last year. Only one of fs.lastModifiedDate or fs.lastModifiedDays may be specified. The default value is disabled

9. Skip file share access control

The connector attempts to preserve access control integrity when sending Access Control Lists (ACLs) to the indexing API. In general, only users that have access to a file share have access to the files maintained on that share, so the connector includes the share's ACL in those sent to the indexing API. However, in some configurations, the connector may not have sufficient permissions to read the share ACL. In those instances, the broken share ACL will prevent all files maintained on that file share from appearing in search results. If the share ACL cannot be read by the connector, the administrator may skip the attempt to read the share ACL by setting the fs.skipShareAccessControl configuration option to true. This feeds a completely permissive share ACL to the indexing API, rather than the actual share ACL.

Use the following parameter to add information about skipping file share access control to the configuration file.

Setting Parameter
Skip file share access control fs.skipShareAccessControl=true

This boolean configuration property enables or disables sending the Access Control List (ACL) for the file share to the indexing API.

The default value is false (share ACLs are sent to the indexing API)

Example: Configuration file

The following example configuration file shows the parameter key=value pairs that define an example connector's behavior.

api.serviceAccountPrivateKeyFile=/path/to/file.json
api.sourceId=0123456789abcde
api.identitySourceId=a1b1c1234567
traverse.abortAfterExceptions=500
fs.src=\\\\host\\share;\\\\dfshost\\dfsnamespace;\\\\dfshost\\dfsnamespace\\link
fs.monitorForUpdates = true
fs.preserveLastAccessTime = IF_ALLOWED

10. Enable logging

Create a folder named logs in the same directory that contains the connector binary. Create an ASCII or UTF-8 file named logging.properties in the same directory and add the following content:

handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler
# Default log level
.level = WARNING
com.google.enterprise.cloudsearch.level = INFO
com.google.enterprise.cloudsearch.fs.level = INFO

# uncomment line below to increase logging level to enable API trace
#com.google.api.client.http.level = FINE
java.util.logging.ConsoleHandler.level = INFO
java.util.logging.FileHandler.pattern=logs/connector-fs.%g.log
java.util.logging.FileHandler.limit=10485760
java.util.logging.FileHandler.count=10
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter

11. Configure mime-type.properties

Optionally, you can create an ASCII or UTF-8 file named mime-type.properties in the connector directory. In this file, specify the Multipurpose Internet Mail Extensions (MIME) types for each file type. If you do not specify MIME types, the connector will try to detect the MIME type for each file. The connector relies on JDK-provided mime type detection. On Microsoft Windows, JDK relies on Windows registry to determine mime types for files. A missing registry entry can result in a null mime-type for certain files.

Standard applications have their standard MIME types. The purpose of the mime-type.properties is only to overwrite any bindings you wish to change. The mime-type.properties file should be in the same top-level directory as connector-config.properties and logging.properties. The format for the specification is: file extension and its mime-type. For example:

xlsx=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
one=application/msonenote

Example: MIME types file

The following example shows the MIME types file.

txt=text/plain
pdf=application/pdf

Run the Cloud Search File Systems connector

After you install the Cloud Search File Systems connector, you can run it on the host machine by using a command like the following example:

java -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-filesystem-connector-v1-0.0.2-withlib.jar

Send feedback about...

Cloud Search
Cloud Search