Google App Engine

Prospective Search Python Overview

Python |Java |PHP |Go

Experimental!

Prospective Search is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Prospective Search. We will inform the community when this feature is no longer experimental.
 


  1. Overview
  2. Using prospective search
  3. Topics and result keys
  4. Creating documents
  5. Supported types for property values
  6. Query language overview
  7. Receiving match responses

Overview

Prospective search is a querying service that allows your application to match search queries against real-time data streams. For every document presented, prospective search returns the ID of every registered query that matches the document.

Prospective search allows you to register a large set of queries and simultaneously match the queries against a single document. It is particularly useful for applications that process streaming data, for example:

  • Applications that match against all the updates on a social networking service, or against high-frequency comments in a chat room.
  • Applications that process data sources that provide notification, monitoring, or filtering services.

To understand prospective search, it's helpful to compare it to the conventional retrospective search model. In a retrospective search application, such as Google search, the application must build, or have access to, an index of the data to be searched. Needing to pre-index the data makes it difficult and expensive to create real-time applications, because each query must be executed separately against a potentially large index.

In a prospective search application, such as Google Alerts, you register search queries and match them against new documents in real time, as the documents are inserted into your application. This allows you to create applications that efficiently monitor incoming live data. You are not limited to using existing, indexed data.

Applications often use both retrospective and prospective search capabilities to get the best of both worlds. For example, an application can use retrospective search to find matching documents indexed in the past while using prospective search to find matching documents as soon as they arrive.

The life of a typical prospective search application looks something like this:

  1. You decide on the appropriate document schema. Your choice will depend on the type of source data that the application is designed to handle.
  2. The application uses the subscribe() call to register query subscriptions with prospective search using the query language.
  3. The application converts items in the streaming source data into documents, which are instances of db.Model.
  4. The application uses the match() call to present documents to prospective search for matching against subscribed queries.
  5. Prospective search returns matching subscription IDs and documents in the task queue. These results are subject to the usual Quotas and Limits.

Here's a summary of the essential function calls:

Function Description
get_subscription() Returns information about a single subscription such as the state, the query, the expiration time, and the subscription ID.
list_subscriptions() Returns information about a specified number of subscriptions, such as the state, the query, the expiration time, and the subscription ID.
list_topics() Lists all topics currently in existence.
match() Matches all subscriptions within a topic. Returns results in the Task Queue rather than returning them directly, to ensure that the application can scale.
subscribe() Registers subscriptions made up of a subscription ID and a query for a given topic. Expect a delay of a few seconds between when subscribe() returns successfully and when the subscription becomes registered and active. A successful subscribe call guarantees that the subscription will eventually be registered.
unsubscribe() Removes a subscription.

Topics and result keys

Prospective search applications may match queries against one or more streams of documents. Developers separate streams of documents by assigning a unique topic to documents they want grouped together and matched against a given set of queries. Generally, developers assign the same topic to documents of the same schema or format, but this convention is not enforced.

Topics are not defined as a separate step; instead, topics are created as a side effect of the subscribe() call. As soon as a new topic is passed to subscribe(), the topic exists. As soon as the last subscription using a given topic is deleted, the topic ceases to exist.

Documents are assigned to a particular topic when calling match(). The topic name can either be explicitly specified to match() or is taken from the class name of the document. See Creating Documents.

Use list_topics() to list all topics that currently exist.

You may also specify a result_key argument in the match() call that is returned with the matching results. A result_key can be useful if you know, for example, that returned documents are too large for the task queue. In this case, you can choose to store the documents in a database and use the identifying result_key to retrieve them later.

Creating documents

The document is a class derived from db.Model. It contains a set of properties which correspond to fields, and queries can match against these fields. For example, the following code sample creates a definition using db.Model.

class Comment(db.Model):
  author = db.StringProperty()
  body = db.TextProperty()
  length = db.IntegerProperty()

The example document will have the topic "Comment" derived from the class name, unless it is explicitly overwritten in the match() call. The document defines two string fields named author and body, and one integer field named length.

Here's how to populate the db.Model object with data from a data source and create an instance of the document:

comment = Comment()
comment.author = "Rose Jones"
comment.body = "A rose by any other name would smell as sweet."
comment.length = len(comment.body)

This example stores a string, text, and an integer in the appropriate fields.

Supported types for property values

Prospective search matches the following properties:

  • db.StringProperty
  • db.IntegerProperty
  • db.BooleanProperty
  • db.FloatProperty
  • db.TextProperty

Prospective search also supports list properties. Conditions on list properties check all values in the list and match any matching value in the list. The following list properties are supported:

  • db.StringListProperty()
  • db.ListProperty()

For db.ListProperty, prospective search supports the following types:

  • str
  • unicode
  • bool
  • int (32-bit int range only)
  • float
  • db.Text

Query language overview

Prospective search uses a simple query language allowing you to query the contents of a document's fields. This query language supports numeric and text expressions and uses a field:value syntax. The field identifies the name of a property defined as part of the Entity or derived document class. The value defines the query on the specified field—a string or numerical value. Text fields and queries can be unicode strings.

Prospective search supports all space-delimited languages. Prospective search supports some languages not segmented by spaces (specifically, Chinese, Japanese, Korean, and Thai). For these languages, prospective search segments the text automatically.

Simple queries

The simplest type of query consists only of a string or text type value. The value can be a word or phrase to be matched against any supported string or text fields in the document. Queries are not case sensitive.

For example, to find all documents with the word "rose" (regardless of case) in any string or text field in the document, use a query like the following:

rose

This simple query matches against any supported string or text field in the document. If your documents are "Comments" as defined in the Creating Documents section, the query matches if the word "rose" appears in the author or body fields. If the schema defines additional string or text fields, such as a subject or email, rose also matches the contents of those fields.

To match a phrase, surround the query in quotes as follows:

"any other name"

Queries on fields

To create more complex queries that reference specific fields, use both the field and value in your query. Use a colon to delineate the two as follows:

field:value

This syntax allows you to reference any supported field defined in a schema by name. For example, to search for "Rose" only within the author field of a comment document as defined in Creating Documents, use the following query:

author:rose

To search the body field for the phrase "any other name", use the following query:

body:"any other name"

To match against multiple fields at the same time, list a series of field:value pairs together with a space between them as follows:

author:"Rose Jones" body:rose

Search operators

The query language supports a number of Boolean operators as well as parentheses for grouping parts of the query together. The supported Boolean operators are AND, OR, and NOT. Always use uppercase for Boolean operators. Lowercase words are treated as part of the field or value portions of the query.

By default, when you create queries that match multiple fields at the same time, each value is combined with a Boolean AND. For the query as a whole to match, all the specified values must match.

You can also explicitly specify this by using the AND Boolean operator. The following two queries are equivalent:

author:rose body:"any other name"
author:rose AND body:"any other name"

Use the OR operator if you only want to know if any of the two values matches. You can use more than one OR in a query. For example:

author:("bob" OR ("rose" OR "tom") AND "jones")

This example matches any document whose author field contains either "Rose Jones", "Tom Jones", or "Bob".

For an example of Boolean NOT, see the following:

author:rose NOT body:filligree

This example matches any document whose author field contains "rose" but whose body field does not contain "filligree".

Use parentheses to create more complex queries combining supported Boolean operators.

For example:

(author:Thomas OR author:Jones) AND (NOT body:rose)

This example matches documents with author "Thomas" or "Jones" only if the body field of the comment does not contain "rose".

Numeric operators

Numeric operators only match against numeric fields. Supported numeric operators are as follows:

<
>
<=
>=
=

For "not equal to", use the Boolean NOT with a numeric field name such as length. For example:

NOT length = 15

This example returns documents whose length is not 15.

You can combine numeric operators with text and Boolean operators. For example:

author:"Rose Jones" length > 15

This query matches comments whose body field is longer than 15 characters in length and whose author field is "Rose Jones".

Receiving match responses

The Prospective Search API returns match results by creating events on the TaskQueue. This section describes how to process the match events.

The Match method defines which TaskQueue to use, how many subscription ids per TaskQueue task, and what additional information to send (such as the document itself, or a key to identify the document).

To receive the resulting matching subscription ids, first, you must map the request handler to your match response handler:

def main(argv):
  app = webapp2.WSGIApplication([('/', MainHandler),
                                 ('/_ah/prospective_search', MatchResponseHandler)],
                                debug=True)

In MatchResponseHandler you can access parameters of the POST request which includes the matching subscription IDs and the document sent for matching:

class MatchResponseHandler(webapp2.RequestHandler):
   """MatchResponseHandler receives match results from TaskQueue."""
   def post(self):
      # List of subscription ids that matched for match.
      sub_ids = self.request.get_all('id')
      # document from match request, either a python dict or db.Model
      # if result_return_document = true in Match call 
      doc = prospective_search.get_document(self.request)
      # topic from match request
      topic = self.request.get('topic')
      # Key specified in match call.
      key = self.request.get_all('key')
      # Number of total matching subscriptions from match request
      # which generated this result event.
      results_count = self.request.get_all('results_count')
      # Index of 1st subscription in this match result batch.
      # 0 <= result_offset < results_count.
      results_offset = self.request.get_all('results_offset')

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.