Google App Engine

Uploading and Downloading Data in Go

Note: Bulk upload and download is not supported for apps that use Federated (OpenID) authentication.

The bulk loader tool can upload and download data to and from your application's datastore. With just a little bit of setup, you can upload new datastore entities from CSV and XML files, and download entity data into CSV, XML, and text files. Most spreadsheet applications can export CSV files, making it easy for non-developers and other applications to produce data that can be imported into your app. You can customize the upload and download logic to use different kinds of files, or do other data processing.

You can use the bulk loader tool to download and upload all datastore entities in a special format suitable for backup and restore, without any additional code or configuration. You configure the bulk loader with a configuration file that specifies the format of uploaded and downloaded data. You can use the bulk loader itself to automatically generate a configuration file based on your app's datastore, and you can then edit that configuration file to suit your needs exactly.

The bulk loader is available via the appcfg.py command.

  1. Setting up remote_api
  2. Downloading and uploading all data
  3. Configuring the bulk loader

Setting up remote_api

The bulk loader tool communicates with your application running on App Engine using remote_api, a request handler included with the App Engine runtime environment that allows remote applications with the proper credentials to access the datastore remotely.

First, add the remote_api url handler to your app.yaml as follows:

- url: /_ah/remote_api
  script: _go_app
  login: admin

This maps the URL /_ah/remote_api to your Go app. Access to this URL is restricted to administrators for the application.

Then import the appengine/remote_api package in one of your project's packages. Add this line to any of your .go source files:

import _ "appengine/remote_api"

During program initialization, the remote_api package registers itself as an endpoint with the /_ah/remote_api path. The underscore in the import declaration means "import this package, but we won't use it directly." (Without the underscore you would receive an "imported but not used" error message on compilation.)

Finally, update your app:

goapp deploy <app-directory>

Downloading and uploading all data

If your app uses the master/slave datastore, you can download and upload every entity of a kind in a format suitable for backup and restore, all without writing any additional code or configuration. If your app uses the High Replication datastore, it is not so simple. If you attempt to download data, you'll see a high_replication_warning error in the Admin Console, and the downloaded data might not include recently saved entities.

To download all entities of all kinds from an app's master/slave datastore, run the following command:

appcfg.py download_data --url=http://your_app_id.appspot.com/_ah/remote_api --filename=<data-filename>

You can also use the --kind=... argument to download all entities of a specific kind:

appcfg.py download_data --kind=<kind> --url=http://your_app_id.appspot.com/_ah/remote_api --filename=<data-filename>

To upload data to the app's datastore from a file created by appcfg.py download_data, run the following command:

appcfg.py upload_data --url=http://your_app_id.appspot.com/_ah/remote_api --kind=<kind> --filename=<data-filename>

When data is downloaded, the entities are stored along with their original keys. When the data is uploaded, the original keys are used. If an entity exists in the datastore with the same key as an entity being uploaded, the entity in the datastore is replaced.

You can use upload_data to replace the data in the app from which it was dumped, or you can use it to upload the data to a different application. Entities with numeric system IDs will be uploaded with the same IDs, and reference properties will be preserved.

Configuring the bulk loader

The bulk loader uses configuration files to describe the data you're uploading or downloading. You can use the bulk loader itself to automatically generate these configuration files. To generate a configuration file for an existing app, you call the bulk loader with the create_bulkloader_config action. After the configuration file is generated, you'll then edit some details in the file before using it.

Using automatic configuration

The bulk loader uses a bulkloader.yaml file to describe how your data should be transformed when uploaded or downloaded. This file includes a header, followed by a list of transforms. Each transform describes two stages of transformation: between external data and an intermediate format, and between the intermediate format and a datastore entity.

When you import data, one transform reads data from an external source, such as a CSV or XML file, and converts it to an intermediate format (a Python dictionary) that represents the contents of the file. A second transform converts the data from the intermediate format to App Engine datastore entities. When you export data, the process is reversed. First, entities are transformed to an intermediate format, then from that format to the export format.

When you run the bulk loader to automatically generate the bulkloader.yaml file, the bulk loader examines your datastore statistics and creates transforms based on the kinds and properties of your app's data. Note that your datastore statistics can be up to 24 hours old, so if you change your schema, the generated file might not reflect the changes right away.

To automatically generate the bulkloader.yaml file based on your datastore statistics, run the bulk loader with the create_bulkloader_config action:

appcfg.py create_bulkloader_config --filename=bulkloader.yaml --url=http://your_app_id.appspot.com/_ah/remote_api

You'll use the generated file as input to the bulk loader tool when you run it again to perform an import or export. Below is an example of the output that appears when you run the bulk loader with the create_bulkloader_config action:

[INFO    ] Logging to bulkloader-log-20100516.144319
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 100
[INFO    ] Opening database: bulkloader-progress-20100516.144319.sql3
[INFO    ] Opening database: bulkloader-results-20100516.144319.sql3
[INFO    ] Connecting to your_app_id.appspot.com/_ah/remote_api
No handlers could be found for logger "google.appengine.tools.appengine_rpc"
[INFO    ] Downloading kinds: ['__Stat_PropertyType_PropertyName_Kind__']
.
[INFO    ] Have 64 entities, 0 previously transferred
[INFO    ] 64 entities (23986 bytes) transferred in 1.9 seconds

Now let's look at the generated bulkloader.yaml file, along with descriptions of each section. The first section:

# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property map
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object. If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

You'll need to edit the generated file before you can use it with your data. These instructions at the top of the file remind you of items in the file that you should address.

The next section lists Python modules to be imported:

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.db
- import: re
- import: base64

You probably won't have to edit this section, unless you want to import any additional Python modules when doing the bulk loader import or export.

The next section of the bulkloader.yaml file provides details on how the data should be transformed upon input and output:

transformers:
- kind: Permission
  connector: # TODO: Choose a connector here: csv, simplexml, etc...
  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: account
      external_name: account
      # Type: Key Stats: 119 properties of this type in this kind.
      import_transform: transform.create_foreign_key('TODO: fill in Kind name')
      export_transform: transform.key_id_or_name_as_string

    - property: invite_nonce
      external_name: invite_nonce
      # Type: String Stats: 19 properties of this type in this kind.

    - property: role
      external_name: role
      # Type: Integer Stats: 119 properties of this type in this kind.
      import_transform: transform.none_if_empty(int)

    - property: user
      external_name: user
      # Type: Key Stats: 119 properties of this type in this kind.
      import_transform: transform.create_foreign_key('TODO: fill in Kind name')
      export_transform: transform.key_id_or_name_as_string

The bulkloader.yaml file contains one set of transforms for each kind you want to process. The generated file contains a set for each kind in your datastore. This example includes the following: one kind, permission; a connector, which you'll fill in when the file is edited, specifying the external format of the data; optional connector_options, also to be added, that specify various settings and flags for the connector; and a property_map describing all the properties in the data.

Editing the configuration file

The first step in editing the bulkloader.yaml file is to specify the connector and connector options. The bulk loader supports CSV and XML (represented by csv and xml respectively) connectors for data import and export, and simple text (simpletext) for export only. In the sample, we'll set up the CSV connector to read the data on import and write it on export. We'll use the default options of the CSV connector, which are to read the column names from the first row of the CSV file, and to write them there on export.

- kind: Permission
  connector: csv

The next section describes the properties of the data. Each property entry specifies how to transform a particular property on import and export. The auto-generated file includes 4 properties identified by the bulk loader, plus the __key__ pseudo-property. Each property has an external name and optional transforms for input and output. These specify how to transform data between the datastore and an external representation. We also must add the kind names for the two reference properties, replacing TODO strings in the original file. Here's the edited properties section:

property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: account
      external_name: account
      import_transform: transform.create_foreign_key('Account')
      export_transform: transform.key_id_or_name_as_string

    - property: invite_nonce
      external_name: invite_nonce

    - property: role
      external_name: role
      import_transform: transform.none_if_empty(int)

    - property: user
      external_name: user
      import_transform: transform.create_foreign_key('User')
      export_transform: transform.key_id_or_name_as_string

With the bulkloader.yaml file complete, we can now import data from an external CSV file to the datastore:

appcfg.py upload_data --config_file=bulkloader.yaml --filename=users.csv --kind=Permission --url=http://your_app_id.appspot.com/_ah/remote_api

You can use the same bulkloader.yaml file to export data, as in the following example command line:

appcfg.py download_data --config_file=bulkloader.yaml --filename=users.csv --kind=Permission --url=http://your_app_id.appspot.com/_ah/remote_api

In this invocation of appcfg.py, data from your app's datastore is exported to a CSV file named users.csv.

Configuration file reference details

This section contains details of the format of bulkloader.yaml files and options for the appcfg.py tool.

The file begins with a header containing information that applies to the entire file. This section is used for specifying Python modules to be imported.

The Transformers section lists entity kinds and transform information. Each entity begins by specifying a kind. You can specify a model class instead of a kind. For each entity kind, the file specifies a connector (csv, xml, or simpletext [for export only]), optional connector options, and a property map. Within the property map, each property is specified, along with an external name, and transforms for import and export, if required.

The connector options are as follows:

csv connector
encoding
Any Python standard encoding format, such as utf-8 (the default) or windows-1252.
column_list
Use a sequence of names specified here for columns on import and export. If not specified, use first row of data to calculate external_name of each column, then read or write data starting with second row.
skip_import_header_row
If true, header line will be ignored on import.
print_export_header_row
If true, header line will be printed on export.
import_options
Additional keyword arguments for the Python CSV module on import. Use dialect: excel-tab for a TSV file.
export_options
Additional keyword arguments for the Python CSV module on export.
simplexml connector
xpath_to_nodes
An xpath that specifies the nodes to be read. Basic queries of the form /node1/node2 are supported, but some others may not be. If an alternate form is specified, export will be disabled. Namespaces are not well supported.
style
Possible values are element_centric and attribute_centric. The children of the nodes found with xpath_to_nodes will be converted into the intermediate format. The style argument determines whether the attributes of the found node (attribute_centric) or the child nodes (element_centric) are used. The entire node is also passed in as __node__.
simpletext connector
template
A Python dict interpolation string used for each exported record.
prolog (optional)
Written before the per-record output.
epilog (optional)
Written after the per-record output.
mode (optional)
text (default)
Text file mode. Newlines are written between records.
nonewline
Text file mode. No newlines are added.
binary
Binary file mode. No newlines are added.

The property map section defines the details of the transform between the entity and the intermediate format. The elements of the property map:

property
The name of the property, as defined in the entity or model.
external_name
Maps a single property, such as a single CSV column, to a single entry in the intermediate dictionary.
import_template
Specifies multiple dictionary items for a single property, using Python string interpolation.
import_transform
A single-argument function that returns the correct value and type data based on the external_name or import_template strings. Examples include the built-in Python conversion operators (such as float), any of several helper functions provided in transform, such as get_date_time or generate_foreign_key, a function provided in your own library, or an in-line lambda function. Or, a two-argument function with the keyword argument bulkload_state, which on return contains useful information about the entity: bulkload_state.current_entity, which is the current entity being processed; bulkload_state.current_dictionary, the current export dictionary, and bulkload_state.filename, the --filename argument that was passed to appcfg.py.
export_transform
Like import_transform, except performed on export.
export
Like import_template, except performed on export and specified as a sequence of external_name/export_transform values.

Each entity created on import has a key. If you don't specify a key, it will be generated automatically by the datastore. If you want to use or calculate a key from the import data, specify a key using the same syntax as the property map; that is, external_name, import_template, and so on.

If you want to do additional processing on data that can't be easily described in a property map, you can specify a function to modify the entity in arbitrary ways, or even return multiple entities on import. To use this feature, add one or both of the following to your transform entry:

post_import_function(input_dict, instance, bulkload_state_copy) functionName

Your function must return one of the following: None, which means to skip importing this record; a single entity (usually the instance argument that was passed in); or a list of multiple entities to be imported.

post_export_function(instance, export_dict, bulkload_state) functionName

Your function must return one of the following: None, which means this result should be skipped; or a dict (typically the export_dict argument that was passed in), containing the entities.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.