Google BigQuery

Managing Tables

This document describes how to manage tables.

Contents

Overview

A BigQuery table is a standard, two-dimensional table with individual records organized in rows, and a datatype assigned to each column (also called a field). Individual fields within a record may contain nested and repeated children fields.

Every table is described by a schema that describes field names, types, and other information. Once a table's schema has been specified you cannot change it without first deleting all the of the data associated with it. If you want to change a table's schema, you must specify a writeDisposition of WRITE_TRUNCATE. For more information, see the Jobs resource.

You can specify the schema of a table during the initial table creation request, or you can create a table without a schema and declare the schema in the query or load job that first populates the table.

Each field has the following properties:

  • name - Field names are any combination of uppercase and/or lowercase letters (A-Z, a-z), digits (0-9) and underscores, but no spaces. The first character must be a letter.
  • type - The following data types are supported; see Data Formats for details on each data type:
    • STRING
    • INTEGER
    • FLOAT
    • BOOLEAN
    • RECORD A JSON object, used when importing nested records. This type is only available when using JSON source files.
  • mode - Whether a field can be null. The following values are supported:
    • NULLABLE - The cell can be null.
    • REQUIRED - The cell cannot be null.
    • REPEATED - Zero or more repeated simple or nested subfields. This mode is only supported when using JSON source files.

List tables in a dataset by calling bigquery.tables.list. You must have access to the specified dataset to list its tables.

Example

Java

This sample uses the Google APIs Client Library for Java.

// List all tables
private static void listTables(Bigquery service, String project, String dataset) throws IOException {

  Tables.List listTablesReply = service.tables().list(project, dataset);
  TableList tableList = listTablesReply.execute();

  if (tableList.getTables() != null) {

    List tables = tableList.getTables();

    System.out.println("Tables list:");

    for (TableList.Tables table : tables) {
      System.out.format("%s\n", table.getId());
    }

  }
}

// Retrieve the specified table resource
private static void getTable(Bigquery service, String project, String dataset, String tableId) throws IOException {

  Tables tableRequest = service.tables();
  Table table = tableRequest.get(project,dataset,tableId).execute();

  System.out.format("Printing table resource: %s:%s.%s\n", project,dataset,tableId);
  System.out.println(table.toPrettyString());

}
  

Python

This sample uses the Google APIs Client Library for Python.

# List all tables
def listTables(service, project, dataset):
  try:
      tables = service.tables()
      listReply = tables.list(projectId=project, datasetId=dataset).execute()
      print 'Tables list:'
      pprint.pprint(listReply)

  except HttpError as err:
      print 'Error in listTables:', pprint.pprint(err.content)

# Retrieve the specified table resource
def getTable(service, projectId, datasetId, tableId):
  tableCollection = service.tables()
  try:
    tableReply = tableCollection.get(projectId=projectId,
                                   datasetId=datasetId,
                                   tableId=tableId).execute()
    print 'Printing table resource %s:%s.%s' % (projectId, datasetId, tableId)
    pprint.pprint(tableReply)

  except HttpError as err:
    print 'Error in querytableData: ', pprint.pprint(err)
  

Back to top

Creating a Table

BigQuery offers two ways to create a new table:

Both methods support creating a new table based on the data, populating an existing, empty table with an appropriate schema, or appending data to a table that already has data.

Back to top

Exporting a Table

You can export a table as a CSV file to a Google Cloud Storage bucket or to JSON. For more information, see Exporting Data.

Back to top

Appending Data

You can import additional data into a table either from source files or by appending query results. Note that the schema of the imported data must match the schema of the existing table.

To append data from a source file:

  1. Create additional source files,
  2. Upload the files, using either code or the browser tool, and
  3. Import them to your existing table by referencing the existing table. In code, set writeDisposition=WRITE_APPEND.

To append data from a query result:

  1. Run an asynchronous query, pass in the name of your existing table, and set writeDisposition=WRITE_APPEND.

Back to top

Browsing Through Table Data

Browse through a table's data by calling bigquery.tabledata.list, specifying the name of the table and an optional row offset. This method lets you specify the maximum number of rows to return per page; if not specified, each page can hold as many as will fit in the permitted response data size. If you request a row index beyond the last row, the method will return successfully, but without a rows property.

Values are returned wrapped in a JSON object that you must parse, as described in the reference documentation.

Tip: You can find out how many rows are in your table either by querying for one page of results and examining the totalRows property, or by running the query SELECT COUNT(*) from myProject:mydataset.myTable

Browsing Through Table Data Examples

Java

This sample uses the Google APIs Client Library for Java.

import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.model.TableList;

import java.io.IOException;
import java.util.List;

// ...

public class MyClass {

  // ...

  /**
   * Lists all tables in the specified dataset.
   *
   * @param service An authorized BigQuery client.
   * @param projectNumber The current Project number.
   * @param datasetId The BigQuery dataset id.
   * @throws IOException
   */
  private static void listTables(Bigquery service, String projectNumber, String datasetId) throws IOException {

    Bigquery.Tables.List listTablesReply = service.tables().list(projectNumber, datasetId);
    TableList tableList = listTablesReply.execute();

    if (tableList.getTables() != null) {

      List tables = tableList.getTables();

      System.out.println("Tables list:");

      for (TableList.Tables table : tables) {
        System.out.format("%s\n", table.getId());
      }

    }

  }

}
  

Python

This sample uses the Google APIs Client Library for Python.

# Query for table data, with an optional start index.
def queryTableData(service, project, dataset, table, startIndex=0):
  tableDataJob = service.tabledata()
  try:
    queryReply = tableDataJob.list(projectId=project,
                                   datasetId=dataset,
                                   tableId=table,
                                   startIndex=startIndex).execute()
    print 'Printing table %s:%s.%s' % (project, dataset, table)

    # When we've printed the last page of results, the next
    # page does not have a rows[] array.
    while 'rows' in queryReply:
      printTableData(queryReply, startIndex)
      startIndex += len(queryReply['rows'])
      queryReply = tableDataJob.list(projectId=project,
                                     datasetId=dataset,
                                     tableId=table,
                                     startIndex=startIndex).execute()
  except HttpError as err:
    print 'Error in querytableData: ', pprint.pprint(err.content)

def printTableData(data, startIndex):
  for row in data['rows']:
    rowVal = []
    for cell in row['f']:
        rowVal.append(cell['v'])
    print 'Row %d: %s' % (startIndex, rowVal)
    startIndex +=1
  

Back to top

Modifying or Deleting Rows or Data

BigQuery tables are append-only. The query language does not currently support either updating or deleting data. In order to update or delete data, you must delete the table, then recreate the table with new data. Alternatively, you could write a query that modifies the data and specify a new results table.

Back to top

Deleting a Table

When you delete a table, the table and all its data are removed immediately.

Example

Java

This sample uses the Google APIs Client Library for Java.

private static void deleteTable(Bigquery service, String projectId, String datasetId, String tableId)
  throws IOException {
    service.tables().delete(projectId,
                            datasetId,
                            tableId).execute();
}
  

Python

This sample uses the Google APIs Client Library for Python.

def deleteTable(service, projectId, datasetId, tableId):
  service.tables().delete(projectId=projectId,
                          datasetId=datasetId,
                          tableId=tableId).execute()
  

Back to top

Copying an Existing Table

There are three different ways you can copy a table: using the browser tool, the command-line tool, or programmatically through the API:

To copy a table using the API:

You can copy an existing table through the API by calling the bigquery.jobs.insert method, and configuring a copy job. You must specify the following in your job configuration:

"copy": {
      "sourceTable": {       // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "destinationTable": {  // Required
        "projectId": string, // Required
        "datasetId": string, // Required
        "tableId": string    // Required
      },
      "createDisposition": string,  // Optional
      "writeDisposition": string,   // Optional
    },

where sourceTable provides information about the table to be copied, destinationTable provides information about the new table, createDisposition specifies whether to create the table if it doesn't exist, and writeDisposition specifies whether to overwrite or append to an existing table.

Python Sample

The following Python sample describes how to copy a table using the API. If you run this script on the command line, it prompts you for information about the source table and the destination table and performs the copy job:

def copyTable(service):
   try:
    sourceProjectId = raw_input("What is your source project? ")
    sourceDatasetId = raw_input("What is your source dataset? ")
    sourceTableId = raw_input("What is your source table? ")

    targetProjectId = raw_input("What is your target project? ")
    targetDatasetId = raw_input("What is your target dataset? ")
    targetTableId = raw_input("What is your target table? ")

    jobCollection = service.jobs()
    jobData = {
      "projectId": sourceProjectId,
      "configuration": {
          "copy": {
              "sourceTable": {
                  "projectId": sourceProjectId,
                  "datasetId": sourceDatasetId,
                  "tableId": sourceTableId,
              },
              "destinationTable": {
                  "projectId": targetProjectId,
                  "datasetId": targetDatasetId,
                  "tableId": targetTableId,
              },
          "createDisposition": "CREATE_IF_NEEDED",
          "writeDisposition": "WRITE_TRUNCATE"
          }
        }
      }

    insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()

    # Ping for status until it is done, with a short pause between calls.
    import time
    while True:
      status = jobCollection.get(projectId=targetProjectId,
                                 jobId=insertResponse['jobReference']['jobId']).execute()
      if 'DONE' == status['status']['state']:
          break
      print 'Waiting for the import to complete...'
      time.sleep(10)

    if 'errors' in status['status']:
      print 'Error loading table: ', pprint.pprint(status)
      return

    print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!

    # Now query and print out the generated results table.
    queryTableData(service, targetProjectId, targetDatasetId, targetTableId)

   except HttpError as err:
    print 'Error in loadTable: ', pprint.pprint(err.resp)

Back to top

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.