Google App Engine

Scheduled Backups

Jacob Butcher, Doug Anderson
April 16, 2012

Experimental!

Datastore Administration is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Datastore Administration. We will inform the community when this feature is no longer experimental.
 


Introduction

Note: Use of this feature is limited to backups started from the application's cron or task queue.

You can run scheduled backups for your application using the App Engine Cron service. To do this for Python or Go apps, specify backup cron jobs in cron.yaml. For Java apps, specify the backup cron job in cron.xml. Currently there is no way to specify a scheduled backup programmatically.

Setting Up a Scheduled Backup

To set a scheduled backup for your app,

  1. If you haven't already done so, enable Datastore Admin for your app.
  2. If you are using Google Cloud Storage for your backups, and have not yet done so, properly configure the bucket you are using for backups.
  3. In your application directory, if you don't already have one, create a cron.yaml file for a Python or Go app or a cron.xml file for a Java app.
  4. Add the backup cron entries. These specify the backup schedule, the set of entities to back up, and the storage to be used for the backups, as described in Specifying Backups in a Cron File. Here are some examples:

    Python

    Sample Python cron.yaml

    cron:
    - description: My Daily Backup
      url: /_ah/datastore_admin/backup.create?name=BackupToCloud&kind=LogTitle&kind=EventLog&filesystem=gs&gs_bucket_name=whitsend
      schedule: every 12 hours
      target: ah-builtin-python-bundle

    Java

    Sample Java cron.xml (note use of "&", as "&" is interpreted by XML)

    <?xml version="1.0" encoding="UTF-8"?>
    <cronentries>
      <cron>
        <description>My Daily Backup</description>
        <url>/_ah/datastore_admin/backup.create?name=BackupToCloud&amp;kind=LogTitle&amp;kind=EventLog&amp;filesystem=gs&amp;gs_bucket_name=whitsend</url>
        <schedule>every 12 hours</schedule>
        <target>ah-builtin-python-bundle</target>
      </cron>
    </cronentries>
  5. Deploy this file with your app. (You can verify the Cron job you just deployed by clicking Cron Jobs in the left nav pane.)

The backups will occur on the schedule you specified. While it runs, it will show up in the Pending Backups list. After the backup is complete, you can view it and use it in the list of available backups within the Datastore Admin tab.

Specifying Backups in a Cron File

These are the fields to include in your cron file to perform scheduled backups:

description
This is the title that appears in the Cron Job list. It can be anything you wish.
url
The url is required and must be in this format:
/_ah/datastore_admin/backup.create?name=<backup-name-prefix>&kind=<kind-1>&kind=<kind-N>&queue=<task-queue>&filesystem=<filesystem-type>&gs_bucket_name=<bucket-name>&namespace=<namespace>

These fields can appear in the url query string:

  • name is an optional prefix that is prepended to the backup name. It helps you identify your backups. If not supplied, the default "cron-" will be used.
  • The kind field can appear one or more times. Each value specifies an entity kind that you wish to back up. You must specify at least one entity kind. In the Datastore Admin Console, the default is that all entity kinds are backed up. With a cron backup, there is no such default: if you don't specify a kind, it doesn't get backed up.
  • queue is optional. It specifies the task queue to be used. If not supplied, the default task queue is used.
  • filesystem specifies the kind of storage to be used. The value "blobstore" means that Blobstore is used to store the backups; the value "gs" means that Google Cloud Storage is used. If no value is supplied, blobstore is used by default.
  • gs_bucket_name is required if you use Google Cloud Storage for backups. It specifies the bucket name used for storage.
  • namespace is optional. When provided, only entities from the selected namespace are included in the backup.

Note: The url cannot be longer than 2000 characters. As shown in the cron.xml Java example above, you must use the HTML entity "&amp;" to separate fields, rather than the ampersand character ("&") since that will be interpreted by XML.

schedule
This field is required: it defines the recurring schedule at which the backup runs. For complete details, see the Schedule Format documentation for Python or Java).
target
This is required. It identifies the app version the cron backup job is to be run on. You must use the value ah-builtin-python-bundle because that is the version of your app that contains the Datastore Admin features that the cron job needs to execute. Keep in mind that the cron backup job is running against this version of your app, so you incur costs when the cron backup job is running. (The ah-builtin-python-bundle version of your app is enabled when you enable Datastore admin for your app.)

Warning! Backup, restore, copy, and delete operations are executed within your application, and thus count against your quota.

Very frequent backups often lead to higher costs. When you run a Datastore Admin job, you are actually running underlying MapReduce jobs. MapReduce jobs cause frontend instance hours to increase on top of Storage operations and Storage usage. To keep an eye on your resource usage, click on the Dashboard link under Main in the left navigation. On the top of the page select ah-builtin-python-bundle from the Version drop down menu.

Troubleshooting

When the scheduled backup runs, App Engine performs a GET using the backup url. If the GET succeeds it results in http status 200. When it fails it results in http status code 400. You can look at the logs to determine whether a backup succeeded or failed by doing the following:

  1. In the Admin Console for your application, click Logs in the left navigation pane, under Main.
  2. Locate the version pulldown menu, which is immediately to the right of the application pulldown. The app pulldown should be showing the name of your app, the version pulldown is most likely showing the number 1.
  3. In the version pulldown, select ah-builtin-python-bundle to display the logs.
  4. Locate your backup job in the log to determine whether it succeeded or failed. If there was a failure, in addition to the status code 400, there will be an error message to help you determine the cause of the error.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.