Google App Engine

Updating Your Model's Schema

Justin McWilliams and Mark Ivey, Google Engineers
December 2012

This is one of a series of in-depth articles discussing App Engine's datastore. To see the other articles in the series, see Related links.

If you are maintaining a successful app, you will eventually find a reason to change your schema. This article walks through an example showing the two basic steps needed to update an existing schema:

  1. Updating the Model class
  2. Updating existing Entities in the datastore (this step isn't always necessary, we'll talk more about when to do it below).

Before We Start

While updating your schema, you may need to disable the ability for your users to edit data in your application. Whether or not this is necessary depends on your application, but there are a few situations (like trying to add a sequential index value to each entity) where it is much easier to correctly update existing entities if no other edits are happening.

Updating Your Models

Here's an example of a simple picture model:

class Picture(db.Model):
    author = db.StringProperty()
    png_data = db.BlobProperty()
    name = db.StringProperty(default='')  # Unique name.

Let's update this so each picture can have a rating. To store the ratings, we'll store the number of votes and the average value of the votes. Updating the model is fairly easy, we just add two new properties:

class Picture(db.Model):
    author = db.StringProperty()
    png_data = db.BlobProperty()
    name = db.StringProperty(default='')  # Unique name.
    num_votes = db.IntegerProperty(default=0)
    avg_rating = db.FloatProperty(default=0)

Now all new entities going into the datastore will get a default rating of 0. Note that existing entities in the datastore don't automatically get modified, so they won't have these properties.

Updating Existing Entities

The App Engine datastore doesn't require all entities to have the same set of properties. After updating your models to add new properties, existing entities will continue to exist without these properties. In some situations, this is fine, and you don't need to do any more work. When would you want to go back and update existing entities so they also have the new properties? One situation would be when you want to do a query based on the new properties. In our example with Pictures, queries like "Most popular" or "Least popular" wouldn't return existing pictures, because they don't (yet) have the ratings properties. To fix this, we'll need to update the existing entities in the datastore.

Conceptually, updating existing entities is easy. You just need to write a request handler to load all entities, set the value of the new property, and save them back to Datastore. However, if you need to update more than a couple thousand entities, you'll likely need to work around the short request deadline.

To do this, we can take advantage of the Task Queue API (Python, Java, Go) and Query Cursors. These will allow us to easily update small batches of entities in multiple different requests. First, we can write a small request handler which simply inserts a Task into the Task Queue. Each Task will then perform the following:

  1. Initialize a query for entities to update.
  2. If not the first Task, position the query where the previous Task left off, using the passed Query Cursor.
  3. Perform schema updates on a batch of entites; save to Datastore.
  4. Insert a Task to continue with the next batch in a new request.

First, copy this quick implementation of UpdateSchema() into a new file named update_schema.py:

import logging
import models
from google.appengine.ext import deferred
from google.appengine.ext import db

BATCH_SIZE = 100  # ideal batch size may vary based on entity size.

def UpdateSchema(cursor=None, num_updated=0):
    query = models.Picture.all()
    if cursor:
        query.with_cursor(cursor)

    to_put = []
    for p in query.fetch(limit=BATCH_SIZE):
        # In this example, the default values of 0 for num_votes and avg_rating
        # are acceptable, so we don't need this loop.  If we wanted to manually
        # manipulate property values, it might go something like this:
        p.num_votes = 17
        p.avg_rating = 4
        to_put.append(p)

    if to_put:
        db.put(to_put)
        num_updated += len(to_put)
        logging.debug(
            'Put %d entities to Datastore for a total of %d',
            len(to_put), num_updated)
        deferred.defer(
            UpdateSchema, cursor=query.cursor(), num_updated=num_updated)
    else:
        logging.debug(
            'UpdateSchema complete with %d updates!', num_updated)

Next, create a request handler which uses deferred to kick start the new UpdateSchema() function. As the deferred documentation mentions, you can't call a method in the request handler module, so it's important the request handler and the UpdateSchema() function above live in different modules. Therefore, copy the code below in a new file named update_schema_handler.py:

import webapp2
import update_schema
from google.appengine.ext import deferred

class UpdateHandler(webapp2.RequestHandler):
    def get(self):
        deferred.defer(update_schema.UpdateSchema)
        self.response.out.write('Schema migration successfully initiated.')

app = webapp2.WSGIApplication([('/update_schema', UpdateHandler)])

Finally, you'll need to enable the deferred builtin, and you should also add a URL mapping in app.yaml with "login: admin", to ensure only administrators of your app can perform the schema migration:

builtins:
- deferred: on

handlers:
- url: /update_schema
  script: update_schema_handler.app  # path to webapp2 application definition.
  login: admin
  secure: always

When you're ready to kickoff the schema migration, simply upload the new source to your App Engine application using appcfg and visit the /update_schema handler using your favorite web browser.

Removing Deleted Properties from the Datastore

If you remove a property from your model, you will find that existing entities still have the property. It will still be shown in the admin console and will still be present in the datastore. To really clean out the old data, you need to cycle through your entities and remove the data from each one.

  1. Make sure you have removed the properties from the model definition.
  2. If your model class inherits from db.Model, temporarily switch it to inherit from db.Expando. (db.Model instances can't be modified dynamically, which is what we need to do in the next step.)
  3. Cycle through existing entities (like described above). For each entity, use delattr to delete the obsolete property and then save the entity.
  4. If your model originally inherited from db.Model, don't forget to change it back after updating all the data.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.