Google App Engine

Prospective Search Java Overview

Python |Java |PHP |Go

Experimental!

Prospective Search is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Prospective Search. We will inform the community when this feature is no longer experimental.
 


  1. Overview
  2. Using prospective search
  3. Topics and result keys
  4. Creating documents
  5. Supported types for property values
  6. Query language overview
  7. Handling matches
  8. Development server

Overview

Prospective search is a querying service that allows your application to match search queries against real-time data streams. For every document presented, prospective search returns the ID of every registered query that matches the document.

Prospective search allows you to register a large set of queries and simultaneously match the queries against a single document. It is particularly useful for applications that process streaming data, for example:

  • Applications that match against all the updates on a social networking service, or against high-frequency comments in a chat room.
  • Applications that process data sources that provide notification, monitoring, or filtering services.

To understand prospective search, it's helpful to compare it to the conventional retrospective search model. In a retrospective search application, such as Google search, the application must build, or have access to, an index of the data to be searched. Needing to pre-index the data makes it difficult and expensive to create real-time applications, because each query must be executed separately against a potentially large index.

In a prospective search application, such as Google Alerts, you register search queries and match them against new documents in real time, as the documents are inserted into your application. This allows you to create applications that efficiently monitor incoming live data. You are not limited to using existing, indexed data.

Applications often use both retrospective and prospective search capabilities to get the best of both worlds. For example, an application can use retrospective search to find matching documents indexed in the past while using prospective search to find matching documents as soon as they arrive.

The life of a typical prospective search application looks something like this:

  1. You decide on the appropriate document schema. Your choice will depend on the type of source data that the application is designed to handle.
  2. The application uses the subscribe() call to register query subscriptions with prospective search using the query language.
  3. The application converts items in the streaming source data into documents, which are instances of Entity or subclasses. Subclasses are not required (or even typical for this use in Java).
  4. The application uses the match() call to present documents to prospective search for matching against subscribed queries.
  5. Prospective search returns matching subscription IDs and documents in the task queue. These results are subject to the usual Quotas and Limits.

Here's a summary of the essential function calls:

Function Description
getSubscription() Returns information about a single subscription such as the state, the query, the expiration time, and the subscription ID.
listSubscriptions() Returns information about a specified number of subscriptions, such as the state, the query, the expiration time, and the subscription ID.
listTopics() Lists all topics currently in existence.
match() Matches all subscriptions within a topic. Returns results in the Task Queue rather than returning them directly, to ensure that the application can scale.
subscribe() Registers subscriptions made up of a subscription ID and a query for a given topic. Expect a delay of a few seconds between when subscribe() returns successfully and when the subscription becomes registered and active. A successful subscribe call guarantees that the subscription will eventually be registered.
unsubscribe() Removes a subscription.

Topics and result keys

Prospective search applications may match queries against one or more streams of documents. Developers separate streams of documents by assigning a unique topic to documents they want grouped together and matched against a given set of queries. Generally, developers assign the same topic to documents of the same schema or format, but this convention is not enforced.

Topics are not defined as a separate step; instead, topics are created as a side effect of the subscribe() call. As soon as a new topic is passed to subscribe(), the topic exists. As soon as the last subscription using a given topic is deleted, the topic ceases to exist.

Documents are assigned to a particular topic when calling match().

Use listTopics() to list all topics that currently exist.

You may also specify a resultKey argument in the match() call that is returned with the matching results. A resultKey can be useful if you know, for example, that returned documents are too large for the task queue. In this case, you can choose to store the documents in a database and use the identifying resultKey to retrieve them later.

Creating documents

Classes of type com.google.appengine.api.datastore.Entity or its subclasses are used to represent data to be matched against subscribed queries:

Entity comment = new Entity("comment", "example");

comment.setProperty("author", "Rose Jones");

// Use Text to store long strings in the datastore.
String bodyStr = "A rose by any other name would smell as sweet.";
comment.setProperty("body", new Text(bodyStr));
comment.setProperty("length", bodyStr.length());

List<String> labels = list("poetry", "trite");
comment.setProperty("labels", labels);

Unlike the Python API, document types are not used to automatically infer topic names: you need to explicitly specify topic names to the match call.

Supported types for property values

Prospective Search accepts the following Entity property values:

  • java.lang.Boolean
  • java.lang.Double
  • java.lang.Integer
  • java.lang.String
  • com.google.appengine.api.datastore.Text

Prospective search also supports properties with java.util.List collections of the above types. Conditions on list properties check all values in the list and match any matching value in the list.

Query language overview

Prospective search uses a simple query language allowing you to query the contents of a document's fields. This query language supports numeric and text expressions and uses a field:value syntax. The field identifies the name of a property defined as part of the Entity or derived document class. The value defines the query on the specified field—a string or numerical value.

Prospective search supports all space-delimited languages. Prospective search supports some languages not segmented by spaces (specifically, Chinese, Japanese, Korean, and Thai). For these languages, prospective search segments the text automatically.

Simple queries

The simplest type of query consists only of a string or text type value. The value can be a word or phrase to be matched against any supported string or text fields in the document. Queries are not case sensitive.

For example, to find all documents with the word "rose" (regardless of case) in any string or text field in the document, use a query like the following:

rose

This simple query matches against any supported string or text field in the document. If your documents are "Comments" as defined in the Creating Documents section, the query matches if the word "rose" appears in the author or body fields. If the schema defines additional string or text fields, such as a subject or email, rose also matches the contents of those fields.

To match a phrase, surround the query in quotes as follows:

"any other name"

Queries on fields

To create more complex queries that reference specific fields, use both the field and value in your query. Use a colon to delineate the two as follows:

field:value

This syntax allows you to reference any supported field defined in a schema by name. For example, to search for "Rose" only within the author field of a comment document as defined in Creating Documents, use the following query:

author:rose

To search the body field for the phrase "any other name", use the following query:

body:"any other name"

To match against multiple fields at the same time, list a series of field:value pairs together with a space between them as follows:

author:"Rose Jones" body:rose

Search operators

The query language supports a number of Boolean operators as well as parentheses for grouping parts of the query together. The supported Boolean operators are AND, OR, and NOT. Always use uppercase for Boolean operators. Lowercase words are treated as part of the field or value portions of the query.

By default, when you create queries that match multiple fields at the same time, each value is combined with a Boolean AND. For the query as a whole to match, all the specified values must match.

You can also explicitly specify this by using the AND Boolean operator. The following two queries are equivalent:

author:rose body:"any other name"
author:rose AND body:"any other name"

Use the OR operator if you only want to know if any of the two values matches. You can use more than one OR in a query. For example:

author:("bob" OR ("rose" OR "tom") AND "jones")

This example matches any document whose author field contains either "Rose Jones", "Tom Jones", or "Bob".

For an example of Boolean NOT, see the following:

author:rose NOT body:filligree

This example matches any document whose author field contains "rose" but whose body field does not contain "filligree".

Use parentheses to create more complex queries combining supported Boolean operators.

For example:

(author:Thomas OR author:Jones) AND (NOT body:rose)

This example matches documents with author "Thomas" or "Jones" only if the body field of the comment does not contain "rose".

Numeric operators

Numeric operators only match against numeric fields. Supported numeric operators are as follows:

<
>
<=
>=
=

For "not equal to", use the Boolean NOT with a numeric field name such as length. For example:

NOT length = 15

This example returns documents whose length is not 15.

You can combine numeric operators with text and Boolean operators. For example:

author:"Rose Jones" length > 15

This query matches comments whose body field is longer than 15 characters in length and whose author field is "Rose Jones".

Handling matches

Match notifications are delivered to the TaskQueue that was specified during subscription.

The callback contains repeated id parameters, one for each matching subscription. The value of the parameter is the subscription ID of a matching subscription.

If the number of matching subscriptions exceeds the batch size specified in the match request, multiple callbacks covering the entire set of subscription IDs will be enqueued. Two request parameters, results_offset and results_count, will be set to indicate which part of the result set each callback contains.

For an example of how to handle these callbacks, see the handler in the Prospective Search demo at: com.google.appengine.demos.prospectivesearch.MatchResponseServlet.

Receiving match responses

To receive the resulting matching subscription ids, define a servlet-mapping:

<servlet>
  <servlet-name>matches</servlet-name>
  <servlet-class>
    com.google.appengine.demos.prospectivesearch.MatchResponseServlet
  </servlet-class>
</servlet>
<servlet-mapping>
  <servlet-name>matches</servlet-name>
  <url-pattern>/_ah/prospective_search</url-pattern>
</servlet-mapping>

In MatchResponseServlet you can access parameters of the POST request which includes the matching subscription IDs and the document sent for matching:

class MatchResponseServlet extends HttpServlet {
  /**
   * Handle Prospective Search match callbacks.
   */
  @Override
  public void doPost(final HttpServletRequest req, final HttpServletResponse rsp)
    throws ServletException, IOException {
    int resultsOffset = Integer.parseInt(req.getParameter("results_offset"));
    int resultsCount = Integer.parseInt(req.getParameter("results_count"));
    String [] reqSubIDs = req.getParameterValues("id");
    // Optional inclusion of matched entity if requested in original match(...) request:
    Entity matchedEntity = null;
    if (req.hasParameter("document")) {
      matchedEntity =
        ProspectiveSearchServiceFactory.getProspectiveSearchService().getDocument(req);
    }

    // Do something based on match...
  }
}

Development server

The local Prospective Search implementation stores subscriptions in the local datastore. By default, modifications to subscriptions are persisted directly after calls to subscribe or unsubscribe or when triggered by subscription expiration. This behavior can be disabled and re-enabled with the boolean system property prospectivesearch.autocommit. This is useful for fast batch creation or modification of subscription state.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.