Google App Engine

Python NDB Overview

The NDB API provides persistent storage in a schemaless object datastore. It supports automatic caching, sophisticated queries, and atomic transactions. NDB is well-suited to storing structured data records.

An application creates entities, objects with data values stored as properties of an entity. When the application reads an entity, that entity is automatically cached; this gives fast (and inexpensive) reads for frequently-read entities. The application can perform queries over entities.

  1. Introducing NDB
  2. Storing Data
  3. Queries and Indexes
  4. Understanding NDB Writes: Cache, Commit, Apply, and Data Visibility
  5. Django
  6. Quotas and Limits

Introducing NDB

NDB saves data objects, known as entities. An entity has one or more properties, named values of one of several supported data types. For example, a property can be a string, an integer, or a reference to another entity.

NDB can group multiple operations in a single transaction. The transaction cannot succeed unless every operation in the transaction succeeds; if any of the operations fail, the transaction is automatically rolled back. This is especially useful for distributed web applications, where multiple users may be accessing or manipulating the same data at the same time.

NDB uses Memcache as a cache service for "hot spots" in the data. If the application reads some entities often, NDB can read them quickly from cache.

An application uses NDB to define models. A model is a Python class that acts as a sort of database schema: it describes a kind of data in terms of the properties. The underlying Datastore is very flexible in how it stores data objects—two entities of the same kind can have different properties. An application can use NDB models to enforce type-checking but doesn't have to.

Each entity is identified by a key, an identifier unique within the application's datastore. The key can have a parent, another key. This parent can itself have a parent, and so on; at the top of this "chain" of parents is a key with no parent, called the root.

Entities whose keys have the same root form an entity group or group. If entities are in different groups, then changes to those entities might sometimes seem to occur "out of order". If the entities are unrelated in your application's semantics, that's fine. But if some entities' changes should be consistent, your application should make them part of the same group when creating them.

import cgi
import urllib

import webapp2

from google.appengine.ext import ndb


class Greeting(ndb.Model):
  """Models an individual Guestbook entry with content and date."""
  content = ndb.StringProperty()
  date = ndb.DateTimeProperty(auto_now_add=True)

  @classmethod
  def query_book(cls, ancestor_key):
    return cls.query(ancestor=ancestor_key).order(-cls.date)


class MainPage(webapp2.RequestHandler):
  def get(self):
    self.response.out.write('<html><body>')
    guestbook_name = self.request.get('guestbook_name')
    ancestor_key = ndb.Key("Book", guestbook_name or "*notitle*")
    greetings = Greeting.query_book(ancestor_key).fetch(20)

    for greeting in greetings:
      self.response.out.write('<blockquote>%s</blockquote>' %
                              cgi.escape(greeting.content))

    self.response.out.write("""
          <form action="/sign?%s" method="post">
            <div><textarea name="content" rows="3" cols="60"></textarea></div>
            <div><input type="submit" value="Sign Guestbook"></div>
          </form>
          <hr>
          <form>Guestbook name: <input value="%s" name="guestbook_name">
          <input type="submit" value="switch"></form>
        </body>
      </html>""" % (urllib.urlencode({'guestbook_name': guestbook_name}),
                    cgi.escape(guestbook_name)))

class SubmitForm(webapp2.RequestHandler):
  def post(self):
    # We set the parent key on each 'Greeting' to ensure each guestbook's
    # greetings are in the same entity group.
    guestbook_name = self.request.get('guestbook_name')
    greeting = Greeting(parent=ndb.Key("Book", guestbook_name or "*notitle*"),
                        content = self.request.get('content'))
    greeting.put()
    self.redirect('/?' + urllib.urlencode({'guestbook_name': guestbook_name}))


app = webapp2.WSGIApplication([
  ('/', MainPage),
  ('/sign', SubmitForm)
])

Storing Data

A model is a class that describes a type of entity, including the types and configuration for its properties. It's roughly analogous to a SQL Table. An entity can be created by calling the model's class constructor and then stored by calling the put() method.

class Greeting(ndb.Model):
  """Models an individual Guestbook entry with content and date."""
  content = ndb.StringProperty()
  date = ndb.DateTimeProperty(auto_now_add=True)

class SubmitForm(webapp2.RequestHandler):
  def post(self):
    # We set the parent key on each 'Greeting' to ensure each guestbook's
    # greetings are in the same entity group.
    guestbook_name = self.request.get('guestbook_name')
    greeting = Greeting(parent=ndb.Key("Book", guestbook_name or "*notitle*"),
                        content = self.request.get('content'))
    greeting.put()

This sample code defines the model class Greeting. Each Greeting entity has two properties: the text content of the greeting and the date the greeting was created.

To create and store a new greeting, the application creates a new Greeting object and calls its put() method.

To make sure that greetings in a guestbook don't appear "out of order" the application sets a parent key when creating a new Greeting. Thus, the new greeting will be in the same entity group as other greetings in the same guestbook. The application uses this fact when querying: it uses an ancestor query.

Queries and Indexes

An application can query to find entities that match some filters.

class Greeting(ndb.Model):
  ...

  @classmethod
  def query_book(cls, ancestor_key):
    return cls.query(ancestor=ancestor_key).order(-cls.date)

  def get(self):
    guestbook_name = self.request.get('guestbook_name')
    ancestor_key = ndb.Key("Book", guestbook_name or "*notitle*")
    greetings = Greeting.query_book(ancestor_key).fetch(20)

    for greeting in greetings:
      self.response.out.write('<blockquote>%s</blockquote>' %
                              cgi.escape(greeting.content))

A typical NDB query filters entities by kind. In this example, query_book generates a query that returns Greeting entities. A query can also specify filters on entity property values and keys. As in this example, a query can specify an ancestor, finding only entities that "belong to" some ancestor. A query can specify sort order. If a given entity has at least one (possibly null) value for every property in the filters and sort orders and all the filter criteria are met by the property values, then that entity is returned as a result.

Every query uses an index, a table that contains the results for the query in the desired order. The underlying Datastore automatically maintains simple indexes (indexes that use only one property).

It defines its complex indexes in a configuration file, index.yaml. The development web server automatically adds suggestions to this file when it encounters queries that do not yet have indexes configured.

You can tune indexes manually by editing the file before uploading the application. You can update the indexes separately from uploading the application (appcfg.py update_indexes). If your datastore has many entities, it takes a long time to create a new index for them; in this case, it's wise to update the index definitions before uploading code that uses the new index. You can use the Administration Console to find out when the indexes have finished building.

This index mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries common in other database technologies. In particular, joins aren't supported.

Understanding NDB Writes: Commit, Invalidate Cache, and Apply

Note: For a full discussion of writes to the underlying Datastore, see Life of a Datastore Write and Transaction Isolation in App Engine.

NDB writes data in steps:

  • In the Commit phase, the underlying Datastore service records the changes.
  • NDB invalidates its caches of the affected entity/entities. Thus, future reads will read from (and cache) the underlying Datastore instead of reading stale values from the cache.
  • Finally, perhaps seconds later, the underlying Datastore applies the change. It makes the change visible to global queries and eventually-consistent reads.

The NDB function that writes the data (for example, put()) returns after the cache invalidation; the Apply phase happens asynchronously.

If there is a failure during the Commit phase, there are automatic retries, but if failures continue, your application receives an exception. If the Commit phase succeeds but the Apply fails, the Apply is rolled forward to completion when one of the following occurs:

  • Periodic Datastore "sweeps" check for uncompleted Commit jobs and apply them.
  • The next write, transaction, or strongly-consistent read in the impacted entity group causes the not-yet-applied changes to be applied before the read, write, or transaction.

This behavior affects how and when data is visible to your application. The change may not be completely applied to the underlying Datastore a few hundred milliseconds or so after the NDB function returns. A non-ancestor query performed while a change is being applied may see an inconsistent state (i.e., part but not all of the change). For more information about the timing of writes and queries, see Transaction Isolation in App Engine.

Django

To use NDB with the Django web framework, add 'google.appengine.ext.ndb.django_middleware.NdbDjangoMiddleware', to the MIDDLEWARE_CLASSES entry in your Django settings.py file. It's best to insert it in front of any other middleware classes, since some other middleware may make datastore calls and those won't be handled properly if that middleware is invoked before this middleware. (You can learn more about Django middleware.)

Quotas and Limits

Various aspects of your application's Datastore usage are counted toward your resource quotas:

  • Each call to the Datastore API counts toward the Datastore API Calls quota. (Note that some library calls result in multiple calls to the API, and so use more of your quota.)
  • Data sent to the Datastore by the application counts toward the Data Sent to Datastore API quota.
  • Data received by the application from the Datastore counts toward the Data Received from Datastore API quota.
  • The total amount of data currently stored in the Datastore for the application cannot exceed the Stored Data (billable) quota. This includes all entity properties and keys, as well as the indexes needed to support querying those entities. See the article How Entities and Indexes Are Stored for a complete breakdown of the metadata required to store entities and indexes at the Bigtable level.

For information on systemwide safety limits, see the Quotas and Limits page and the Quota Details section of the Administration Console. In addition to such systemwide limits, the following limits apply specifically to the use of the Datastore:

Limit Amount
Maximum entity size 1 megabyte
Maximum transaction size 10 megabytes
Maximum number of index entries for an entity 20000
Maximum number of bytes in composite indexes for an entity 2 megabytes

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.