Structuring Data for Strong Consistency
The Google App Engine High Replication Datastore (HRD) provides high availability for your reads and writes by storing data synchronously in multiple datacenters. However, the delay from the time a write is committed until it becomes visible in all datacenters means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. Consequently, the results of such queries may sometimes fail to reflect recent changes to the underlying data.
To obtain strongly consistent query results, you need to use an ancestor query limiting the results to a single entity group. This works because entity groups are a unit of consistency as well as transactionality. All data operations are applied to the entire group; an ancestor query won't return its results until the entire entity group is up to date. If your application relies on strongly consistent results for certain queries, you may need to take this into consideration when designing your data model. This page discusses best practices for structuring your data to support strong consistency.
To understand how to structure your data for strong consistency, compare two different approaches for the
guestbook example application from the App Engine Getting Started exercise. The first approach creates a new root entity for each greeting:
import webapp2 from google.appengine.ext import db class Guestbook(webapp2.RequestHandler): def post(self): greeting = Greeting() ...
It then queries on the entity kind
Greeting for the ten most recent greetings.
import webapp2 from google.appengine.ext import db class MainPage(webapp2.RequestHandler): def get(self): self.response.out.write('<html><body>') greetings = db.GqlQuery("SELECT * " "FROM Greeting " "ORDER BY date DESC LIMIT 10")
However, because non-ancestor queries only guarantee eventually consistent results, the datacenter used to perform the query in this scheme may not have seen the new greeting by the time the query is executed. With eventual consistency, nearly all of your writes are available for queries within a few seconds; a solution that provides the data in the context of the current user's own posts will usually be sufficient to make such performance completely acceptable.
If strong consistency is important to your application, an alternate approach is to use a parent key for the kind and save subsequent entities in the entity group defined by this parent key:
import webapp2 from google.appengine.ext import db class Guestbook(webapp2.RequestHandler): def post(self): guestbook_name=self.request.get('guestbook_name') greeting = Greeting(parent=guestbook_key(guestbook_name)) ...
Queries for these entities can then use the parent key to perform an ancestor query, which will find only those entities:
import webapp2 from google.appengine.ext import db class MainPage(webapp2.RequestHandler): def get(self): self.response.out.write('<html><body>') guestbook_name=self.request.get('guestbook_name') greetings = db.GqlQuery("SELECT * " "FROM Greeting " "WHERE ANCESTOR IS :1 " "ORDER BY date DESC LIMIT 10", guestbook_key(guestbook_name))
This approach achieves strong consistency by writing to a single entity group per guestbook, but it also limits changes to the guestbook to no more than 1 write per second (the supported limit for entity groups). If your application is likely to encounter heavier write usage, you may need to consider using other means: for example, you might put recent posts in a memcache with an expiration and display a mix of recent posts from the memcache and the Datastore, or you might cache them in a cookie, put some state in the URL, or something else entirely. The goal is to find a caching solution that provides the data for the current user for the period of time in which the user is posting to your application. Remember, if you do a get, an ancestor query, or any operation within a transaction, you will always see the most recently written data.