Google BigQuery

Codelab: Creating a BigQuery Dashboard

Author: Rufus, Date: 01/13/2011, Tags: Codelab, App Engine, Google Charts

Contents

  1. Objective
  2. Requirements
  3. Introduction
  4. Setting Up
    1. Create a New App Engine Project: hellodashboard
    2. Download and Install Libraries
    3. Enable BigQuery API
    4. Create and Record Client Secret
    5. Add OAuth2 Support
  5. Create a Class for BigQuery
    1. A Small Optimization
    2. Understanding the Service Object
    3. [Optional Reading] Under the Covers with the OAuth2 Decorator
  6. BigQueryClient Constructor
    1. Which Project ID Should You Use?
  7. Make Our First Authenticated Calls
  8. Building the Dashboard
    1. Picking the Right Question for a Query
    2. Viewing the Data
  9. Conclusion

Objective

This codelab describes how to build a dashboard application on top of Google BigQuery. Our specific implementation uses Google App Engine using Python and Google Charts using Javascript. The general strategy and techniques should be applicable to any platform and visualization package.

This is an intermediate lab and we assume you have already completed Part 1 of the Python Codelab or the Python App Engine Getting Started guide. That means you are already have your basic App Engine development environment configured and that you've successfully run a basic app on your system using the development server.

Requirements

These requirements were required in the Python Codelab and are also required for this lab:

In addition, this BigQuery Dashboard Codelab requires:

Introduction

This codelab is based on the online tutorial found at http://code.google.com/p/google-api-python-client/wiki/OAuth2. Like other Google APIs, the BigQuery service requires strong authentication, and OAuth2 is the simplest solution. We will take that existing tutorial and add calls to BigQuery that aggregates across a large public data set and then display the results in a visual fashion suitable for a dashboard.

Highlights

Setting Up

In the Python Codelab for App Engine, you focused on using the services available within App Engine itself, like the Datastore, email, and XMPP. If you've already gotten things running by using the Google App Engine Getting Started guide, that is fine as an alternative to the Python Codelab.

BigQuery is a service that runs outside of App Engine and so that involves additional libraries for authentication and access. We can add the authentication and access in the following steps:

  • Create a new App Engine project
  • Download and install libraries.
  • Enable the BigQuery API
  • Create and Record Client Secrets in client_secrets.json
  • Add OAuth2 Support to main.py and app.yaml

After the setup, we will create a basic class for BigQuery and make our first authenticated query. Once we have the query results, we will reformat them to fit Google Charts and display them as a dashboard.

  1. Create New App Engine Project: hellodashboard

    As you did in the Python Codelab, create a new App Engine application containing app.yaml and main.py. Remember that if you are going to deploy your application to the internet then you must pick a unique name and register a new project at http://appengine.google.com first. I'll call mine 'hellodashboard'. Using the launcher to create the base application, files and directory is encouraged. Your application must exist in its own directory so that we can install the supporting libraries. Let's assume this project is in 'source/hellodashboard'. We'll refer to that later when we install the libraries.

    If you like, you can make the 'Minor Refactoring' optimizations suggested in the Python Codelab, moving the Web Server Gateway Interface out into the global namespace. As stated there:

    The reason for this is because App Engine caches imports and handlers scripts for better performance. If we left the instantiation of the webapp.WSGIApplication class within main(), we would be creating a new instance for every request! We also reposition the call (and indentation) so that it’s easier to add new handlers in a more list-oriented fashion.

    As a result, main() should now only contain the call to run_wsgi_app()... and while we’re at it, we might as well do a minor piece of cleanup by importing run_wsgi_app() directly from the from google.appengine.ext.webapp.util subpackage. Since we aren’t using anything else from util, this makes sense to do and avoids additional clutter in the code which uses it.

    Here is the fully updated second generation of our main.py handler script:

    from google.appengine.ext import webapp
    from google.appengine.ext.webapp.util import run_wsgi_app
    
    class MainHandler(webapp.RequestHandler):
        def get(self):
            self.response.out.write('Hello world!')
    
    application = webapp.WSGIApplication([
       ('/', MainHandler),
    ], debug=True)
    
    def main():
       run_wsgi_app(application)
    
    if __name__ == '__main__':
        main()

    At this point you could try running the sample in your development App Engine server and access it through your browser. You should see a friendly 'Hello World.' That's nice. What we really wanted was that directory, 'source/hellodashboard' so we have a place to install our libraries.

  2. Download and Install Libraries

    1. Download the Google APIs Client Library for Python from http://code.google.com/p/google-api-python-client

      You can do this either by checking out a copy of the Mercurial repository or by downloading the zipped or tar'd google-api-python-client-* archive. You can download it anywhere on your system that is conveniently reachable from the command line. Let's assume you have downloaded this to 'source/'.

      There are two client libraries in this google-api-python-client-* package: the Google APIs client library and the oauth2client library. You will need both of them for this codelab.

    2. Open a command line shell and change to the directory where you downloaded the client library (e.g. 'source/') and use Python's easy_install utility to complete the installation.

      If you don't have easy_install, you can get it by installing the setuptools package. Point easy_install to either the Mercurial repository directory or to the tarball, like this:

      cd source/
      easy_install --upgrade google-api-python-client-1.0beta7.tar.gz

      You might need to add a sudo to the beginning of that command depending on your platform. If you would like to avoid using sudo, you could consider using virtualenv, but using that tool is beyond the scope of this codelab.

    You have now completed installing the libraries to your system, but there is one more step to getting them installed for your App Engine application (hellodashboard). You must copy the libraries to your App Engine project directory so that App Engine knows to include them. Fortunately, the client library installation provides a helpful tool to do just that: enable-app-engine-project. Here is how to use it, assuming hellodashboard is under source/:

    cd source/
    enable-app-engine-project hellodashboard/
    Copied: hellodashboard/gflags.py Successfully: 1
    Copied: hellodashboard/gflags_validators.py Successfully: 1
    Copied: hellodashboard/httplib2/__init__.py
    …
    Copied: hellodashboard/oauth2client/__init__.py
    …
    Copied: hellodashboard/oauth2/__init__.py
    …
    Copied: hellodashboard/apiclient/__init__.py
    …
    Copied: hellodashboard/uritemplate/__init__.py

    You will see that the enable-app-engine-project tool copies over 60 files, including all the libraries you'll need for authentication and easy access to the Google APIs like BigQuery.

    The API and authentication libraries are now ready for use.

  3. Enable the BigQuery API

    Before we can call the BigQuery API, we must first enable it in a Google API project. Let's create a new project for hellodashboard:

    1. Open the Google APIs Console.
    2. Create a new Google API project by clicking the project dropdown button on the top left of the page.
    3. Click Create.
    4. Name your project. Let's call it 'Hello Dashboard Project'.
    5. Activate the Google BigQuery API in the Services pane by clicking the Status button to toggle it to On. During the Limited Availability period, you can skip this step. This switch may not be visible on your project, but if you were invited to participate then your project is already enabled for BigQuery as if you clicked the Status button to 'On'.
  4. Create and Record Client Secrets

    Why do we need secrets? A shared secret establishes a trust relationship between Google, who authenticates the user, and your application, which the user authorizes to access data. This is where authentication and authorization via OAuth2 come into play. You have already installed the OAuth2 libraries (step 2, above) and these libraries, combined with the secret and auth services at Google, will enforce the security and trust between Google, your application, and the user. If you are interested in the authentication process then there is a very informative presentation on the Google Code site: How Authentication Works. For now, we'll stick to the practical matters of setting up your application and let the libraries do all the hard work.

    The first step is to generate some secrets and to tell the API Console the internet location where your application is running. The secrets are for OAuth and the location is so that the Google Servers can redirect the user back to your application after he has identified himself.

    1. Go to the Google APIs Console and open your 'Hello Dashboard Project'.
    2. Click API Access.
    3. Click Create an OAuth 2.0 client ID.
    4. In the Product Name field enter Hello Dashboard. Click Next.
    5. Choose application type Web Application
    6. Choose http protocol.
    7. Fill hostname with localhost:8080 (assuming this is the port where your App Engine development server is running change it if necessary). The Redirect URI should be http://localhost:8080/oauth2callback
    8. Click Create client ID.
    9. Go to API Access and you should see a box titled Client ID for web applications.

    There! You've told Google where your development server will be running and you've generated the client id and secret. Now it is time to give this information to hellodashboard.

    Let's create a new file: source/hellodashboard/client_secrets.json

    1. Open the client_secrets.json file for editing.
    2. Copy the value for Client ID, Client secret and Redirect URIs from the Google APIs console Client ID for web applications to client_secrets.json.
      The results should look like this:
      {
       "web": {
         "client_id": "999999999.apps.googleusercontent.com",
         "client_secret": "SOME_SECRET_STRING",
         "redirect_uris": ["http://localhost:8080/oauth2callback"],
         "auth_uri": "https://accounts.google.com/o/oauth2/auth",
         "token_uri": "https://accounts.google.com/o/oauth2/token"
       }
      }

      where 999999999 is your Client ID from the API Console > API Access pane, and SOME_SECRET_STRING is the Client Secret. Keep your numeric Client ID nearby because you will need it later on as the value of PROJECT_ID.
    3. Save client_secrets.json.
  5. Add OAuth2 Support to main.py and app.yaml

    Let's take a look at main.py again and consider the code we need to add to make BigQuery API calls. This file includes the handler for the get() request which is usually the first interaction a user has with your dashboard. Since the user wants to see information from BigQuery and all BigQuery calls require authorization, that first get() must be authenticated and authorized. That means we must add OAuth2 support.

    There are several ways to add OAuth2 to an application, but for App Engine apps the simplest approach is to use a special App Engine-specific OAuth2 decorator. First we need to import the right library:

    from apiclient.discovery import build
    from oauth2client.appengine import oauth2decorator_from_clientsecrets
    from google.appengine.api import memcache

    That's a long name for a constructor, but it tells us exactly what it does: create an OAuth2 Python decorator from the client_secrets.json file we created earlier. Here is how we instantiate the decorator in main.py (partial file):

    import os
    CLIENT_SECRETS = os.path.join(os.path.dirname(__file__),
       'client_secrets.json')
    decorator = oauth2decorator_from_clientsecrets(
        CLIENT_SECRETS,
        'https://www.googleapis.com/auth/bigquery')

    The decorator needs to know the information for the OAuth2 flow we stored in client_secrets.json as well as the scope of permissions to request. In this codelab, we are just interested in BigQuery, so we use the "Authentication:" scope as defined in the reference manual. If we want to add other authentication scopes for other APIs, we include them in the same string, separated by spaces. With all the scopes in the same string, each service can use the same credentials and the user only authorizes the application once.

    What is this OAuth2 decorator going to do for us? It will handle the OAuth2 flow, checking that the user is logged-in and that we have valid credentials for the requested scopes. If any of these things is missing, then the decorator automatically interacts with the httplib2 request object and redirect the user to the proper pages to log in and/or give permission to the application to acquire credentials. You can do all of these steps manually, but the decorator is a great way to simplify your App Engine code, and so for clarity and brevity, we will use it in this codelab.

    Now let's "decorate" the get() method in main.py so that any call to get() prepares the decorator for the OAuth2 flow (partial file):

    class MainHandler(webapp.RequestHandler):
        @decorator.oauth_required
        def get(self):
            self.response.out.write("Hello Dashboard!\n")

    You could try running the code now but you won't see any behavior difference because we're not trying to make an authenticated API call yet. You won't see any user prompts to log-in or authorize APIs. To get ready for those prompts, we must add an OAuth callback handler.

    Many people call the OAuth authorization and authentication flow a "dance" because the user moves back and forth between the application and the OAuth servers on the internet. During this dance, your application must handle a callback from the authentication servers. Fortunately, this is standard enough that it can be part of an App Engine library, and we've already copied it into our project! We don't have to add any code, but we do need to let App Engine know how to map the callback to the library code. We can do that in app.yaml (complete file):

    application: hellodashboard
    version: 1
    runtime: python
    api_version: 1
    
    handlers:
    - url: /favicon\.ico
      static_files: favicon.ico
      upload: favicon\.ico
    
    - url: /oauth2callback
      script: oauth2client/appengine.py
    
    - url: .*
      script: main.py

    When you construct the OAuth2 decorator using the oauth2decorator_from_clientsecrets constructor, you should use the default callback address for your application, /oauth2callback. This is the same callback we configured when we created the API credentials in "Hello Dashboard Project" above and specified in the redirect_uris of client_secrets.json. Now map the OAuth 2 callback url in app.yaml to the library script oauth2client/appengine.py which does all the tedious work of obtaining the user credentials and storing them for future use. If you're curious about those credentials, you can find them stored in the App Engine datastore as a CredentialsModel.

    You've now completed the changes necessary to add OAuth2 to your main get() handler and we can think about what BigQuery API calls we would like to make, confident that the user will be prompted for any necessary permissions. Your main.py should look like this (complete file):

    #!/usr/bin/env python
    import httplib2
    import os
    from google.appengine.ext import webapp
    from google.appengine.ext.webapp.util import run_wsgi_app
    from oauth2client.appengine import oauth2decorator_from_clientsecrets
    
    CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')
    
    decorator = oauth2decorator_from_clientsecrets(CLIENT_SECRETS,
        'https://www.googleapis.com/auth/bigquery')
    
    class MainHandler(webapp.RequestHandler):
        @decorator.oauth_required
        def get(self):
            self.response.out.write("Hello Dashboard!\n")
    
    application = webapp.WSGIApplication([
       ('/', MainHandler),
    ], debug=True)
    
    def main():
       run_wsgi_app(application)
    
    if __name__ == '__main__':
        main()

Create a Class for BigQuery

Since we are building a dashboard, we're going to be interacting with BigQuery quite a bit, making queries and reformatting the results for display. We should wrap BigQuery into a class so that we can encapsulate some default values and present a clean interface. To keep the main.py code clean, let's create this class in a new file, bqclient.py.

The BigQuery API is part of a set of Google APIs available through the API Discovery Service. For Python, that means you don't need to download an API library specifically for BigQuery. Instead, you can download one library and create all the interfaces to any library hosted in the API Discovery Service.

Let's take a look at the function we need to construct those interfaces to BigQuery API calls, our first line of bqclient.py:

from apiclient.discovery import build

The build() function requires two parameters, serviceName and version, which, in our case, have the values bigquery and v2, specifying the BigQuery service with version 2 of the BigQuery API (the current version in December 2012). You can find these parameters as part of the REST URI in the reference manual. We could now construct the interface like this:

service = build('bigquery', 'v2')
but that would miss an opportunity for optimization.

A Small Optimization

The build() method accepts an httplib2 object with caching enabled and takes advantage of HTTP caching and ETag support in the Google APIs. In fact, to get the maximum advantage, we should share this httplib2 object with any other Google APIs we might call so that we have a single object that can reuse connections and cache efficiently.

We could hide the httplib2 usage within bqclient.py, but then we cannot reuse the httplib2 object for other API communication. So let's include httplib2 directly in main.py and instantiate it as a global variable. We'll make memcache available to the httplib2 object to use as an HTTP cache. So let's add this to main.py (partial file):

import httplib2
from google.appengine.api import memcache

http = httplib2.Http(memcache)

So that leads us to this first version of a constructor in bqclient.py:

class BigQueryClient(object):
    def __init__(self, http):
        """Creates the BigQuery client connection"""
        self.service = build('bigquery', 'v2', http=http)

After we learn more about the service object we've constructed, we'll add a bit more to the BigQueryClient constructor to help it complete API calls successfully.

Understanding the Service Object

What is this BigQueryClient.service object that we built in the constructor? This is our primary interface to the Google BigQuery REST API, constructed dynamically from the API Discovery Service. All of the functionality of a service is available in a set of collections. In the case of BigQuery, you can see the collections listed in the REST API documentation: datasets, jobs, projects, tabledata, and tables. Each collection contains one or more methods, and as you may have guessed from the name, most of the work gets done in the jobs collection. The other collections help read and sometimes write metadata for the project, dataset, and tables. In contrast, the jobs collection gives us methods for making queries, importing data, and controlling the jobs themselves, all the longer-running tasks.

Here is an example of building up a BigQueryClient request for the a table's metadata:

def getTableData(self, decorator, project, dataset, table):
    decorated_http = decorator.http()
    tablesCollection = self.service.tables()
    request = tablesCollection.get(
        projectId=project,
        datasetId=dataset,
        tableId=table)
    request.execute(decorated_http)

How did I know that tables().get() would require those three parameters? You can discover the parameters either by looking at the REST API documentation:

GET https://www.googleapis.com/bigquery/v2/projects/{projectId}/datasets/{datasetId}/tables/{tableId}

or by using Python's interactive mode to generate documentation with help():

$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from apiclient.discovery import build
>>> service = build("bigquery", "v2")
>>> help(service.tables())
Help on Resource in module apiclient.discovery object:
class Resource(__builtin__.object)
 |  A class for interacting with a resource.
 |   |  Methods defined here:
...
  |  get = method(self, **kwargs)
 |      Gets the specified table resource by table ID. This
 |      method does not return the data in the table, it only
 |      returns the table resource, which describes the
 |      structure of this table.
 |
 |      Args:
 |        projectId: string, Project ID of the requested table
 |                  (required)
 |        tableId: string, Table ID of the requested table 
 |                  (required)
 |        datasetId: string, Dataset ID of the requested table 
 |                  (required)
 |
 |      Returns:
 |        An object of the form |
 |          { |            "kind": "bigquery#table",
...

The request builds up an HttpRequest based on the collection and method parameters, but it does not actually try to contact the internet service for the BigQuery REST API until the execute() command. Since BigQuery requires authorization for all API requests, we must provide access to the credentials. We can do this by using the same OAuth2 decorator we used to wrap the get() request handler in main.py. Without this decorator, execute() would fail because it would try to use the httplib2 object we passed in to the BigQueryClient constructor. That original httplib2 object, by itself, has no idea how to get the credentials needed for OAuth2 authentication. It needs help from the same decorator that is wrapping the get() handler in main.py. All you need to do is call the decorator's http() method within whatever function it is wrapping (in this case, the get() request handler). If you try to call the decorator's http() function before the call to the decorated handler, the decorator won't have access to the right HttpRequest object and authorization fails.

[Optional Reading] Under the Covers with the OAuth2 Decorator

The OAuth2 decorator wrapping the get() request handler in main.py wraps the whole get() method and thereby gets access to the HttpRequest object passed-in. Since it has access to the request object, it can intercept the call flow and redirect the user immediately to log in or to authorize the application for access to BigQuery. This happens at the moment you call execute(decorated_http). If the user is logged in and the application has the authorization credentials it needs, then execute() makes the REST API call you've built-up. If the user is not logged-in or the application lacks valid credentials, then the decorator immediately redirects the user to the proper log in and authorization pages, and then resumes the original request. For more details, please see the comments in oauth2client/appengine.py.

BigQueryClient Constructor

Now that we know how important the OAuth2 decorator is, let's include that in the BigQueryClient constructor and use it in a method (complete bqclient.py file):

class BigQueryClient(object):
    def __init__(self, http, decorator):
        """Creates the BigQuery client connection"""
        self.service = build('bigquery', 'v2', http=http)
        self.decorator = decorator

    def getTableData(self, project, dataset, table):
        decorated = self.decorator.http()
        return self.service.tables().get(projectId=project,
            datasetId=dataset, tableId=table).execute(decorated)

        def getLastModTime(self, project, dataset, table):
        data = self.getTableData(project, dataset, table)
        if data is not None and 'lastModifiedTime' in data:
            return data['lastModifiedTime']
        else:
            return None

    def Query(self, query, project, timeout_ms=0):
        query_config = {
            'query': query,
            'timeoutMs': timeout_ms
        }
        decorated = self.decorator.http()
        result_json = (self.service.jobs()
                       .query(projectId=project,
                       body=query_config)
                      .execute(decorated))
        return result_json

You may have noticed that getTableData() calls service.tables().get(...).execute(decorated) all in one line. This is the easy way to do it when you want the REST request to be issued immediately. Most of the metadata requests return quickly, so it makes sense to issue these immediately. For requests that may take a few seconds, like jobs(), sometimes it makes more sense to create the request and then execute() it from a longer-lived App Engine task. This codelab focuses on synchronous calls for simplicity, even for the longer-lived query job.

Which project ID should you use?

It is important to know that the project id used in the query is different than the project id used in getting the last modification time. Your query lives in a "job" and jobs live in billable projects. When BigQuery is launched for general availability, we will charge for jobs like queries and ingestion, so the project for your job must be a project where you are authorized as a viewer, so that you can bill to that project. Let's call that project the BILLING_PROJECT_ID and you can use the same numeric value from your client id that you added to client_secrets.json.

The other kind of project id is a location where some data lives. When you ask simple information about a table, like its last modification date, that does not run in a job. Since there is no job, your API call is actually a request directly to that table resource, and so the project ID to these calls is the project id for the data. Let's call this the DATA_PROJECT_ID. For this sample, we will use the "publicdata" project.

The Python API library unfortunately hides the distinction between "jobs" and basic "get" requests, but they are pretty easy to see in the REST API documentation. If you find yourself getting confused about which project id to send to the function, take a look at the original REST documentation to see if it is a "job" or not.

We've imported our libraries and constructed everything we need for OAuth2 authentication on App Engine. We've even built our interface to the BigQuery service. It is time to make our first BigQuery call.

Make our first authenticated calls

Here is the complete code to the updated main.py with the changes needed to construct and call the BigQueryClient:

#!/usr/bin/env python
import bqclient
import httplib2
import os
from google.appengine.api import memcache
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from oauth2client.appengine import oauth2decorator_from_clientsecrets

# Project ID for project to receive bill.
# During limited availability preview, there is no bill.
# The value should be your quoted Client ID number 
# which you previously recorded from code.google.com/apis/console

# REPLACE THIS NUMBER WITH YOUR CLIENT ID
PROJECT_ID = "9999999"
DATASET = "samples"
TABLE = "natality"

# CLIENT_SECRETS, name of a file containing the OAuth 2.0
# information for this application.
CLIENT_SECRETS = os.path.join(os.path.dirname(__file__),
    'client_secrets.json')

http = httplib2.Http(memcache)
decorator = oauth2decorator_from_clientsecrets(CLIENT_SECRETS,
    'https://www.googleapis.com/auth/bigquery')

bq = BigQueryClient(http, decorator)

class MainHandler(webapp.RequestHandler):
    @decorator.oauth_required
    def get(self):
        self.response.out.write("Hello Dashboard!\n")
        modTime = bq.getLastModTime(PROJECT_ID, DATASET, TABLE)
        if modTime is not None:
            msg = 'Last mod time = ' + modTime
        else:
            msg = "Could not find last modification time.\n"
        self.response.out.write(msg)

application = webapp.WSGIApplication([
   ('/', MainHandler),
], debug=True)

def main():
   run_wsgi_app(application)

if __name__ == '__main__':
    main()

Go ahead and run it in the development App Engine server!

When you visit localhost:8080 with your browser, the application tries to perform an HTTP GET. Unlike the first time you ran this application, get() is now decorated with @decorator.oauth_required and so this time you have to log in. Make sure you log in with your real Google email address as your user name, the one with access to BigQuery! Although you are running the development App Engine server, your application is talking to the real BigQuery service, so you need to provide a real account for authentication and authorization.

When you complete the OAuth dance, you should see:

Hello Dashboard! Last mod time = 1318622687171
That timestamp is milliseconds since January 1, 1970, also known as "epoch time", but given in milliseconds instead of the traditional seconds. This converts to 10/14/2011 8:04pm GMT, when we last updated this public dataset on U.S. natality.

We've now successfully made an authorized call to the BigQuery API. We're ready to make a query, but what should we ask BigQuery about the natality data set, and how should we display it?

Building the Dashboard

Google's BigQuery is an ideal back-end for strategic dashboards because it excels at aggregation across massive data sets. The natality dataset we are accessing in this codelab is pretty big, but not massive, containing 137,826,763 rows, just about one for each live birth between 1969 and 2008. It doesn't quite equal one row per birth because between 1969 and 1985 many records are only a 50% sample and should be counted twice. What kind of questions can we ask, and how would we like to display it?

Picking the Right Question for a Query

The description included in the publicdata:samples.natality dataset include a nice pointer to some additional information about the data and some Query examples. You can see this in the data returned from bigquery.tables().get(), or you click on the table in the Web UI:

One of the example queries asks this question: "Do gestation periods vary by state?"

select state, SUM(gestation_weeks) / COUNT(gestation_weeks) as weeks from publicdata:samples.natality where year > 1990 and year < 2005 and IS_EXPLICITLY_DEFINED(gestation_weeks) group by state order by weeks

Grouping by state is nice because it limits the size of the return set to 51 rows (the 50 states plus Washington, DC) and it includes geographic information, which always makes a compelling dashboard.

Viewing the Data

How can we present this data in Google Charts? There are some very nice geocharts to choose from, and options to allow us to focus on just the United States. This is going to require our output to be more sophisticated than the simple text strings we've been creating with response.out.write(). We're going to need to include both Javascript and HTML, so let's take the same strategy as the basic Python Codelab and make use of Web Templates.

Let's create the HTML template in a new file, index.html (complete file):

<html>
<head>
  <title>hellodashboard</title>
  <script type='text/javascript'
     src='https://www.google.com/jsapi'></script>
  <script type='text/javascript'>
     google.load('visualization', '1', {'packages':
       ['geochart']});
     google.setOnLoadCallback(drawMarkersMap);

     geodata = {{ data }}

     function drawMarkersMap() {
       if (! geodata) return;
       var data = new google.visualization.DataTable(geodata);

       var options = {
         region: 'US',
         displayMode: 'region',
       };

       var chart = new google.visualization.GeoChart(
         document.getElementById('chart_div'));
       chart.draw(data, options);
    };
  </script>
</head>
<body>
  <div id='chart_div'></div>
       <p>A simple example of a graphical dashboard, backed by
          Google's BigQuery and running in Python on App Engine.</p>
       <p>Click on the map to highlight the value of each state, as
          generated by this query:</p>
       <pre> {{ query }} </pre>
</body>
</html>

This should look almost exactly like the examples given in Google Charts, with the exception of geodata = {{ data }}. As you recall from the basic Python Codelab, a Web Template gets processed before it is sent to the requesting web browser, and template variables are enclosed in double curly-braces, like {{ data }} and {{ query }} above. We will be able to substitute our geographic data as a JSON object and replace {{ data }}, so we don't need to construct the data row by row as they do in the Google Chart examples.

Using the Web Template requires adding new imports to main.py, but this is included with App Engine, so there are no new libraries to install or copy:

from django.utils import simplejson as json
from google.appengine.ext.webapp.template import render

We include the JSON library to translate from Python data to JSON for inclusion in the template, and we need the render method to interpret the template and insert the data value, a JSON string.

Let's create a utility method called _bq2geo to convert from the BigQuery query response format to what Google geocharts expects. A BigQuery query response comes back as JSON, but the Python API library interprets it into a Python data structure for you. We're going to need to loop through it and convert it into the two columns of data that Google Charts geomaps expects: region and value. One change we're going to have to make is to convert from the two-letter state abbreviations used in the natality table to the 'US-'+state name expected by Google Charts data format:

def _bq2geo(self, bqdata):
        logging.info(bqdata)
        columnNameGeo = bqdata['schema']['fields'][0]['name']
        columnNameVal = bqdata['schema']['fields'][1]['name']

        geodata = { 'cols': (
            {'id':columnNameGeo, 'label':columnNameGeo,
                'type':'string'},
            {'id':columnNameVal, 'label':columnNameVal,
                'type':'number'})}
        geodata['rows'] = [];
        for row in bqdata['rows']:
            newrow = ({'c':[]})
            newrow['c'].append({'v': 'US-'+row['f'][0]['v']})
            newrow['c'].append({'v':row['f'][1]['v']})
            geodata['rows'].append(newrow)
        return json.dumps(geodata)

This method assumes that the query is structure to put the geographic location in the first column and the numeric value in the second column. Any other values in the row are ignored. At the end we convert geodata to a JSON string so that the template renderer can substitute the JSON string directly into the template.

Let's add a global definition for the query using the SQL we decided on above, and we'll make a few minor changes to the get() request handler to use the template in the response.out.write() (main.py, complete file):

import bqclient
import httplib2
import logging
import os
from django.utils import simplejson as json
from google.appengine.api import memcache
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext.webapp.template import render
from oauth2client.appengine import oauth2decorator_from_clientsecrets

# CLIENT_SECRETS, name of a file containing
# the OAuth 2.0 information for this application,
# including client_id and client_secret, which are found
# on the API Access tab on the Google APIs Console # 
CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')

# Project ID for a project where you and your users
#   are viewing members.  This is where the bill will be sent.
#   During the limited availability preview, there is no bill.
# Replace this value with the Client ID value from your project,
#   the same numeric value you used in client_secrets.json  
BILLING_PROJECT_ID = "99999999"
DATA_PROJECT_ID = "publicdata"
DATASET = "samples"
TABLE = "natality"
QUERY = """
select state, SUM(gestation_weeks) / COUNT(gestation_weeks) as weeks 
from publicdata:samples.natality 
where year > 1990 and year < 2005 and IS_EXPLICITLY_DEFINED(gestation_weeks) 
group by state order by weeks
"""
decorator = oauth2decorator_from_clientsecrets(CLIENT_SECRETS,
    'https://www.googleapis.com/auth/bigquery')

http = httplib2.Http(memcache)

bq = bqclient.BigQueryClient(http, decorator)

class MainHandler(webapp.RequestHandler):
    def _bq2geo(self, bqdata):
        """geodata output for region maps must be in the format region, value.
           Assume the BigQuery query output is in this format and get names from schema.
        """
        logging.info(bqdata)
        columnNameGeo = bqdata['schema']['fields'][0]['name']
        columnNameVal = bqdata['schema']['fields'][1]['name']
        logging.info("Column Names=%s, %s" % (columnNameGeo, columnNameVal))
        geodata = { 'cols': ({'id':columnNameGeo,'label':columnNameGeo,'type':'string'},
          {'id':columnNameVal, 'label':columnNameVal, 'type':'number'})}
        geodata['rows'] = [];
        logging.info(geodata)
        for row in bqdata['rows']:
            newrow = ({'c':[]})
            newrow['c'].append({'v': 'US-'+row['f'][0]['v']})
            newrow['c'].append({'v':row['f'][1]['v']})
            geodata['rows'].append(newrow)
        logging.info('FINAL GEODATA---')
        logging.info(geodata)
        return json.dumps(geodata)
        
    @decorator.oauth_required
    def get(self):
        data = { 'data': self._bq2geo(bq.Query(QUERY, BILLING_PROJECT_ID)),
                 'query': QUERY }
        template = os.path.join(os.path.dirname(__file__), 'index.html')
        self.response.out.write(render(template, data))

application = webapp.WSGIApplication([
   ('/', MainHandler),
], debug=True)

def main():
   run_wsgi_app(application)

if __name__ == '__main__':
    main()

And we're done! You should be able to run this on your development App Engine server and see the following:

Congratulations! You've just written your first basic dashboard using BigQuery, Google Charts, and App Engine.

Conclusion

Now that you've finish your first dashboard, you've learned the basics about using OAuth2 in Python for App Engine, creating credentials and wrapping functions that require auth with decorators. You should be able to add your own queries for your own data using the same pattern of code.

If you'd like the complete source code for this sample, please see the Python Hello Dashboard sample. You'll still need to add in the libraries as in the Download and Install Libraries section above.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.