You can set up Google Cloud Search to return results from your organization's Microsoft Windows shares in addition to your Google Workspace content. You use the Cloud Search File Systems connector and configure it to access specified Windows shares. A single connector instance can support multiple Microsoft Windows shares.
Important considerations
Before deploying the File Systems connector, review the following considerations.
Continuous automatic updates
By default, the connector continuously monitors start paths (values from
fs.src in the configuration file) when it starts up. When the file system
reports changes to content or access controls, the connector re-crawls the file
system. This re-crawl can be resource intensive. To turn off monitoring, set
fs.monitorForUpdates to false. This reduces resource use but delays when
the connector reflects changes. Learn more
DFS access control
The DFS system applies access control on its links, and usually each DFS link has its own ACL. DFS uses Access-based Enumeration (ABE) to restrict the links returned to a user. Users might see only a subset of DFS links or only one link when ABE isolates home directories. When traversing a DFS system, the connector respects the DFS link ACL and the target's Share ACL; the Share ACL inherits from the DFS ACL.
Known limitations
This section lists the known limitations of the file system connector.
- File System: The connector doesn't support mapped or local drives.
- Distributed File System: A mapped drive to a UNC DFS doesn't work correctly, and some ACLs might not be read correctly.
- The connector supports DFS namespaces and links but not regular folders in the DFS namespace.
- File links in
cloudsearch.google.comor returned by the Query API aren't clickable in most browsers.
System requirements
Before you deploy the File Systems connector, ensure that the host machine meets the following requirements:
| System requirements | |
|---|---|
| Operating system |
|
| Software |
|
| File system protocols |
Not supported: Local Windows file systems, NFS 2.0, NFS 3.0, or local Linux file systems. |
Deploy the connector
Follow these steps to deploy the File Systems connector.
Prerequisites
Before you deploy the connector, ensure your environment has these components:
Google Workspace information to establish connections:
- Google Workspace private key (containing the service account ID). See Configure access to the Cloud Search API.
- Google Workspace data source ID. See Add a data source to search.
- An identity source ID. See Create an identity source. If you sync with Active Directory, use GCDS.
Your Google Workspace administrator can usually provide these credentials.
Ensure the Windows account has sufficient permissions.
Required Microsoft Windows account permissions
The Windows account running the connector must have permissions to:
- List folder content.
- Read document content.
- Read file and folder attributes.
- Read permissions (ACLs) for files and folders.
- Write basic attributes.
Membership in one of these groups usually grants sufficient permissions: Administrators, Power Users, Print Operators, or Server Operators.
Step 1. Install the connector
Download or clone the connector repository from GitHub, and then build the connector package.
Get the connector repository from GitHub and build it.
To use git on the Windows server:
> git clone https://github.com/google-cloudsearch/windows-filesystems-connector.git > cd windows-filesystems-connector > git checkout tags/v1-0.0.3
To download directly:
- Go to windows-filesystems-connector.
- Click Clone or download > Download zip.
- Unzip the package and move to the directory.
Build the connector using Apache Maven:
To skip tests, use> mvn package
mvn package -DskipTests.Extract the connector zip file to your installation directory:
> cp target/google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip installation-dir > cd installation-dir > unzip google-cloudsearch-windows-filesystems-connector-v1-0.0.3.zip > cd google-cloudsearch-windows-filesystems-connector-v1-0.0.3
Step 2. Create the configuration file
After installing the connector, create a configuration file that contains the settings for the connector.
- In the connector directory, create a file named
connector-config.properties. Add parameters as key-value pairs. For example:
# Required parameters api.serviceAccountPrivateKeyFile=/path/to/file.json api.sourceId=0123456789abcde api.identitySourceId=a1b1c1234567 # File system access fs.src=\\\\host\\share;\\\\dfshost\\dfsnamespace # Optional parameters traverse.abortAfterExceptions=500 fs.monitorForUpdates = true fs.preserveLastAccessTime = IF_ALLOWEDSee the Configuration parameters reference for file-system-specific parameters. For a list of common parameters used by all Cloud Search connectors, see Google-supplied connector parameters.
Step 3. Enable logging
Create a directory for logs and create a logging configuration file.
- Create a folder named
logsin the connector directory. Create a file named
logging.propertieswith this content:handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler # Default log level .level = WARNING com.google.enterprise.cloudsearch.level = INFO com.google.enterprise.cloudsearch.fs.level = INFO # uncomment line below to increase logging level to enable API trace #com.google.api.client.http.level = FINE java.util.logging.ConsoleHandler.level = INFO java.util.logging.FileHandler.pattern=logs/connector-fs.%g.log java.util.logging.FileHandler.limit=10485760 java.util.logging.FileHandler.count=10 java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
Step 4. (Optional) Configure media types
The connector attempts to detect file media types using its default mechanism which, on Windows, relies on registry entries. If a registry entry for a file extension is missing, the connector might fail to detect the media type correctly. If media types aren't detected correctly, or if you want to override the default type for an extension, follow these steps:
- Create a file named
mime-type.propertiesin the connector directory. - Enter extensions and types as
extension=media/type:properties xlsx=application/vnd.openxmlformats-officedocument.spreadsheetml.sheet one=application/msonenote txt=text/plain pdf=application/pdf
Step 5. Run the File Systems connector
Launch the connector from the host machine:
> java -jar google-cloudsearch-windows-filesystems-connector-v1-0.0.3.jar -Djava.util.logging.config.file=logging.properties[ -Dconfig=my.config]By default, the connector looks for a configuration file named
connector-config.properties in the directory where the connector is run.
If your configuration file has a different name or is in another directory,
use the -Dconfig parameter to specify its path.
Configuration parameters reference
The following tables list and describe the parameters used to configure the File Systems connector.
Data source access
| Setting | Parameter |
| Data source ID | api.sourceId=1234567890abcdef
Required. The Cloud Search source ID. |
| Service account | api.serviceAccountPrivateKeyFile=./PrivateKey.json
Required. The path to the service account key file. |
| Identity source ID | api.identitySourceId=x0987654321
Required. The Cloud Search identity source ID set up by the Google Workspace administrator for syncing active directory identities using GCDS. |
File system access
Use these parameters to specify the file system sources to crawl.
| Setting | Parameter |
| Source file systems | fs.src=path1[,path2, ...]
Required. Specify source file systems as one or more UNC sources that are separated by the delimiter configured by |
Path separator character
| Setting | Parameter |
| Path separator character | fs.src.separator=separator-character
The default separator is ";". If your source paths contain semicolons, you can set a different delimiter, such as a comma (","), that does not conflict with characters in your paths and isn't reserved by property file syntax itself. If the |
Connector behavior
Use these parameters to tune how the connector crawls file systems.
| Setting | Parameter |
| Windows domain | fs.supportedDomain=domain
Required to let users who are set up with GCDS access documents through Cloud Search. Specify as a single NetBIOS domain name of the Active Directory. |
| Include accounts in ACLS | fs.supportedAccounts=account-1[, account-2,...]
A comma-delimited list of accounts to include in ACLs regardless of whether they are built-in accounts. The default value is |
| Exclude built-in accounts from ACLs | fs.builtinGroupPrefix=prefix
Specify the prefix of built-in accounts. An account that starts with this prefix is considered a built-in account and will be excluded from the ACLs. The default value is |
| Allow indexing of hidden files and folders | fs.crawlHiddenFiles=boolean
Set to |
| Allow indexing of crawled folder listings and DFS Namespace enumerations | fs.indexFolders=boolean
When set to |
| Enable file system change monitoring | fs.monitorForUpdates=boolean
When set to |
| Set the maximum size of the cache of directories | fs.directoryCacheSize=number-of-entries
The maximum size of the directory cache. The connector uses the cache to identify hidden folders to avoid indexing files and folders in hidden folders. The default is 50,000 entries, which typically consume 10–15 megabytes of RAM. |
Timestamp preservation
Use these parameters to specify how the connector handles timestamp preservation.
| Setting | Parameter |
| Preserve access time | fs.preserveLastAccessTime=value
When crawling files and folders, the connector can change their last access timestamp to the time of the crawl. If last access times aren't preserved, backup and archive systems might not move appropriate files and folders to secondary storage because the connector accessed them. By default, Possible values include:
|
| Crawl only files that were accessed after a certain date | fs.lastAccessedDate=YYYY-MM-DD
Crawl content only if the last access time is after the specified date
(YYYY-MM-DD, ISO8601 format).
The default is |
| Crawl only files that were accessed within the past number of days | fs.lastAccessedDays=number-of-days
Crawl content only if the last access time is within the specified number of
days from the present. The default is |
| Crawl only files that were modified after a certain date | fs.lastModifiedDate=YYYY-MM-DD
Crawl content only if the last modified time is after the specified date
(YYYY-MM-DD, ISO8601 format).
The default is |
| Crawl only files that were modified within the past number of days | fs.lastModifiedDays=number-of-days
Crawl content only if the last modification time is within the specified
number of days from the present. The default is |
Skip file share ACLs
You can set the connector to ignore share ACLs if it lacks permissions to read them. Content is then returned with a permissive share ACL.
| Setting | Parameter |
| Skip share ACLs | fs.skipShareAccessControl=boolean
Set to |