Structure a schema for optimal query interpretation

Cloud Search’s query interpretation feature automatically interprets the operators and filters in a user’s query, and converts those elements into a structured, operator-based query. Query interpretation uses operators defined in the schema, together with the indexed documents, to deduce what the user's query means. This feature allows a user to search with minimal keywords, yet still obtain precise results.

The actual results presented to the user depend on the confidence of the query interpretation. Confidence is based on several factors, including where the query strings appear in indexed documents. A string, such as the name of the actor "Tom Hanks," appearing consistently in a schema field called actors results in a higher confidence. The same string ("Tom Hanks") appearing within a paragraph, rather than schema field, can result in lower confidence. In the case of a strong confidence, only results from query interpretation are displayed to the user. In the case of weaker confidence, the results from the query interpretation are blended with a normal keyword search results.

Example query interpretation

Suppose you have a data source, such as a database, containing information about movies. Figure 1 shows a sample search query and resulting interpretation.

Overview of query interpretation
Figure 1. Query interpretation

Given this example query, query interpretation does the following:

  • Parses the schema and determines that the top-level objects in the data source are classified as objecttype:movies. Query interpretation now knows that "movies" in the query is an object type.

  • Scans documents in the data source, in conjunction with the schema, to determine where the string "action" occurs. If the string primarily occurs in a specific "genre" data source field, then query interpretation has the confidence that "action" is a property value for the property "genre" as defined in the schema. If the string primarily occurs in the context of paragraphs of content, then query interpretation's confidence level decreases.

The resulting query interpretation is:

  actor:“tom hanks” genre:action objecttype:movies

Query interpretation is automatically enabled for all Cloud Search customers with no additional work. However, for optimal query interpretation you should structure your schema per the instructions in this document.

Structure your schema to support query interpretation

You should structure your schema to ensure that you can benefit from query interpretation.

Enable display name interpretations

Cloud Search’s query interpretation utilizes the objectDefinitions and propertyDefinitions in a schema to interpret a user’s query and tune the results. To maximize the benefit of these schema elements, you should create intuitive display names using displayLabel for property names, objectDisplayLabel for object names, and operatorName for operators.

The following schema shows intuitive display names for a movie object:

{
  "objectDefinitions": [
    {
      "name": "movie",
        "options": {
          "displayOptions": {
          "objectDisplayLabel": "Films"
        }
        ...
      },
      "propertyDefinitions": [
        {
          "name": "genre",
          "isReturnable": true,
          "isRepeatable": true,
          "isFacetable": true,
          "textPropertyOptions": {
          "retrievalImportance": { "importance": "HIGHEST" },
          "operatorOptions": {
            "operatorName": "genre"
          }
        },
        "displayOptions": {
          "displayLabel": "Category"
        }
      },
      ...
      ]
    }
  ]
}

In the previous example:

  • The movie object definition has a “Film” objectDisplayLabel.

  • The genre propertyDefinition has a “genre” operatorName and a “Category” displayLabel.

These display names enable Cloud Search to make the following query interpretations:

  • “action movies,” “genre action type movies,” or “movies genre action” are interpreted as genre:action object:movies.
  • “movies with genre action or thriller” is interpreted as objecttype:movies genre:(action OR thriller).
  • “action film” or “action films” is interpreted as genre:action objecttype:movies.
  • “comedy category movies” is interpreted as genre:comedy objecttype:movies.

Enable date, numerical, and sort interpretations

You should define the lessThanOperatorName and greaterThanOperatorName, specified in IntegerOperatorOptions, for all date and numerical properties. These settings enable automatic date and numerical interpretations. Additionally, to enable sort interpretations, set the isSortable option for date and numerical properties. The following schema shows how to enable these options.

{
  "objectDefinitions": [
    {
      "options": {
        "displayOptions": {
          "objectDisplayLabel": "Films"
        }
      },
      "propertyDefinitions": [
        {
          "name": "runtime",
          "isReturnable": true,
          "isSortable": true,
          "integerPropertyOptions": {
            "orderedRanking": "DESCENDING",
            "minimumValue": {
              "value": 10
            },
            "maximumValue": {
              "value": 500
            },
            "operatorOptions": {
              "operatorName": "runtime",
              "lessThanOperatorName": "runtimelessthan",
              "greaterThanOperatorName": "runtimegreaterthan"
            }
          },
          "displayOptions": {
            "displayLabel": "Length"
          }
        },
        {
          "name": "releasedate",
          "isReturnable": true,
          "isSortable": true,
          "datePropertyOptions": {
            "operatorOptions": {
              "operatorName": "releasedate",
              "lessThanOperatorName": "releasedbefore",
              "greaterThanOperatorName": "releasedafter"
            }
          }
        }
      ]
    }
  ]
}

In the previous example:

  • The numeric property runtime refers to the length of a movie. The runtimelessthan and runtimegreaterthan is set for this property.
  • The date property releaseDate refers to when a movie is released in the theaters. The releasedbefore and releasedafter is set for this property.

These settings enable Cloud Search to make the following query interpretations:

  • Assuming the year is 2019, “movies released this year” is interpreted as objecttype: movies releasedafter:2019-1-1 releasedbefore:2019-12-31.
  • Assuming the week is the third week in march, “movies released last week” is interpreted as objecttype: movies releasedafter:2019-3-10 releasedbefore:2019-3-16
  • “movies with runtime less than 90” is interpreted as objjecttype: movies runtimelessthan:90.
  • Assuming the year is 2019, “movies released this year and length more than 120” is interpreted as releasedafter:2019-1-1 releasedbefore:2019-12-31 objecttype:movies runtimegreaterthan:120.
  • “sort movies by release date” would filter on “objecttype: movies” and the results presented would be sorted on released date with the default sort order being ascending.

Enable reserved operator interpretation

You can also use the type, before, after, objecttype reserved built-in operators to enhance query interpretation. When indexing a document, do the following:

  1. Populate the updateTime field in the ItemMetadata to use before and after operators. These settings enable Cloud Search to make the following query interpretations:

    • “movies from last week” would list all the movies that were updated in the index the prior week.
    • “movies before jan 2019” would list all the movies that were indexed before January 2019.
  2. Populate the mimeType field in the ItemMetadata to use autodetection of type. A query “action videos” would list all action movie documents with a mime type of application/mp4, application/mpeg4, application/x-shockwave-flash, video/, and application/vnd.google-apps.video.

Query interpretation limitations

The query interpretation feature has the following limitations.

  • Query interpretation only works for these datasource ACLs:
    • All documents are domain public (everyone in domain can access).
    • All documents are datasource public (everyone that has access to data source ACL).
    • The majority of documents in the datasource have same ACL (all documents inherit ACL from same container item) with no additional readers defined.
  • If multiple schema operators have the same value, the interpretation of that value to an operator intent for a query depends on the overall confidence factor returned by the query interpretation system. For example, suppose you have the properties priority and severity with the same operator names defined in schema. Let's say both operators can have the values 0, 1, 2, or 3. In this example, "0" in a query can refer to the operator value for either priority or severity. These values are ambiguous and confidence level is lower.
  • By default, Cloud Search’s query interpretation lowers the case of field values when interpreting the query, except for those text operators defined with exactMatchWithOperator options.
  • The source operator is not supported in queries.
  • Queries that combine operator-based terms and free text-terms are not interpreted. For example, the query "p0 priority cases severity:s0" wouldn't be supported because "p0 priority cases" is a free text-term while "severity:s0" is a operator-based term.
  • The query interpretation strategy always blends the interpreted results with ordinary (non-interpreted, relevance-ranked) results. It does not perform a full page replacement of results.