Google BigQuery

Managing Jobs, Datasets, and Projects

This document describes how to manage jobs, datasets, and projects.

Contents

Jobs

Jobs are used to start all potentially long-running actions, for instance: queries, table import, and export requests. Shorter actions, such as list or get requests, are not managed by a job resource.

To perform a job-managed action, you will create a job of the appropriate type, then periodically request the job resource and examine its status property to learn when the job is complete, and then check to see whether it finished successfully. Note that there are some wrapper functions that manage the status requests for you: for examples, you can run jobs.query which creates the job and periodically polls for DONE status for a specified period of time.

Jobs in BigQuery persist forever. This includes jobs that are running or completed, whether they have succeeded or failed. You can only list or get information about jobs that you have started, unless you are a project owner, who can perform all actions on any jobs associated with their project.

Every job is associated with a specific project that you specify; this project is billed for any usage incurred by the job. In order to run a job of any kind, you must have READ permissions on the job's project.

Here is how to run a standard job:

  1. Start the job by calling the generic jobs.insert method; the method call returns immediately with the job resource, which includes a jobId that is used to identify this job later.
  2. Check job status by calling jobs.get with the job ID returned by the initial request and check the status.state value to learn the job status. When status.state=DONE, the job has stopped running; however, a DONE status does not mean that the job completed successfully, only that it is no longer running.
  3. Check for job success. If the job has a status.errorResult property, the job has failed; this property holds information describing what went wrong in a failed job. If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors, such as problems importing a few rows in an import request. Non-fatal errors are listed in the returned job's status.errors list.

See the asynchronous query as an example of starting and polling a job.

There is no single-call method to re-run a job; if you want to re-run a specific job:

  1. Call jobs.get to retrieve the resource for the job to re-run,
  2. Remove the id, jobId, status, and statistics field. Change any other fields as necessary.
  3. Call jobs.insert with the modified resource to start the new job.

See jobs in the reference section for more information.

Back to top

Datasets

A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. Read more about datasets in the reference section. A dataset is contained within a specific project. You can list datasets to which you have access by calling bigquery.datasets.list.

See datasets in the reference section for more information.

Example

Java

This sample uses the Google APIs Client Library for Java.

public static void listDatasets(Bigquery bigquery, String projectId) throws IOException {
  Datasets.List datasetRequest = bigquery.datasets().list(projectId);
  DatasetList datasetList = datasetRequest.execute();

  if (datasetList.getDatasets() != null) {
    List<DatasetList.Datasets> datasets = datasetList.getDatasets();
    System.out.println("Dataset list:");

    for (DatasetList.Datasets dataset : datasets) {
      System.out.format("%s\n", dataset.getDatasetReference().getDatasetId());
    }
  }
}
  

Python

This sample uses the Google APIs Client Library for Python.

def ListDatasets(service, project):
    try:
        datasets = service.datasets()
        list_reply = datasets.list(projectId=project).execute()
        print 'Dataset list:'
        pprint.pprint(list_reply)

    except HTTPError as err:
        print 'Error in ListDatasets:', pprint.pprint(err.content)
  

Projects

A project holds a group of datasets. Projects are created and managed in the APIs console. Jobs are billed to the project to which they are assigned. You can list projects to which you have access by calling bigquery.projects.list.

See projects in the reference section and Managing Projects in the APIs Console help for more information.

Example

Java

This sample uses the Google APIs Client Library for Java.

public static void listProjects(Bigquery bigquery) throws IOException {
  Bigquery.Projects.List projectListRequest = bigquery.projects().list();
  ProjectList projectList = projectListRequest.execute();

  if (projectList.getProjects() != null) {
    List<ProjectList.Projects> projects = projectList.getProjects();
    System.out.println("Project list:");

    for (ProjectList.Projects project : projects) {
      System.out.format("%s\n", project.getFriendlyName());
    }
  }
} 

Python

This sample uses the Google APIs Client Library for Python.

def ListProjects(service):
    try:
        # Start training on a data set
        projects = service.projects()
        list_reply = projects.list().execute()

        print 'Project list:'
        pprint.pprint(list_reply)

    except HTTPError as err:
        print 'Error in ListProjects:', pprint.pprint(err.content)
  

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.