Load and query data with the bq tool

Learn how to create a dataset, load sample data, and query tables with the bq command-line tool.


To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:

Guide me


Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  4. Make sure that billing is enabled for your Google Cloud project.

  5. If you do not enable billing for the Google Cloud project that you use in this tutorial, then you will work with data in the BigQuery sandbox. The BigQuery sandbox lets you learn BigQuery with a limited set of BigQuery features at no charge.

  6. Ensure that the BigQuery API is enabled.

    Enable the API

    If you created a new project, the BigQuery API is automatically enabled.

  7. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

Download the source public data file

  1. Download the baby names zip file.
  2. Extract the zip file. It contains a file named NationalReadMe.pdf that describes the dataset schema. Learn more about the baby names dataset.
  3. Open the yob2010.txt file. It's a comma-separated value (CSV) file that contains three columns: name, assigned sex at birth, and number of children with that name. The file has no header row.
  4. Move the file to your working directory.
    • If you're working in Cloud Shell, click More > Upload, click Choose Files, choose the yob2010.txt file, and then click Upload.
    • If you're working in a local shell, copy or move the file yob2010.txt into the directory where you're running the bq tool.

Create a dataset

  1. Create a dataset named babynames:

    bq mk babynames
    

    The output is similar to the following:

    Dataset 'myproject:babynames' successfully created.
    

    A dataset name can be up to 1,024 characters long and consist of A-Z, a-z, 0-9, and the underscore. The name cannot start with a number or underscore, and it cannot have spaces.

  2. Confirm that the dataset babynames now appears in your project:

    bq ls
    

    The output is similar to the following:

      datasetId
    -------------
      babynames
    

Load data into a table

  1. In the babynames dataset, load the source file yob2010.txt into a new table that's named names2010:

    bq load babynames.names2010 yob2010.txt name:string,assigned_sex_at_birth:string,count:integer
    

    The output is similar to the following:

    Upload complete.
    Waiting on bqjob_r3c045d7cbe5ca6d2_0000018292f0815f_1 ... (1s) Current status: DONE
    

    By default, when you load data, BigQuery expects UTF-8 encoded data. If you have data in ISO-8859-1 (or Latin-1) encoding and you have problems with it, instruct BigQuery to treat your data as Latin-1 using bq load -E=ISO-8859-1. For more information, see Encoding.

  2. Confirm that the table names2010 now appears in the babynames dataset:

    bq ls babynames
    

    The output is similar to the following. Some columns are omitted to simplify the output.

      tableId     Type
    ----------- ---------
     names2010    TABLE
    
  3. Confirm that the table schema of your new names2010 table is name: string, assigned_sex_at_birth: string, and count: integer:

    bq show babynames.names2010
    

    The output is similar to the following. Some columns are omitted to simplify the output.

      Last modified        Schema                      Total Rows   Total Bytes
    ----------------- ------------------------------- ------------ ------------
    14 Mar 17:16:45   |- name: string                    34089       654791
                      |- assigned_sex_at_birth: string
                      |- count: integer
    

Query table data

  1. Determine the most popular girls' names in the data:

    bq query --use_legacy_sql=false \
        'SELECT
          name,
          count
        FROM
          `babynames.names2010`
        WHERE
          assigned_sex_at_birth = "F"
        ORDER BY
          count DESC
        LIMIT 5;'
    

    The output is similar to the following:

    +----------+-------+
    |   name   | count |
    +----------+-------+
    | Isabella | 22925 |
    | Sophia   | 20648 |
    | Emma     | 17354 |
    | Olivia   | 17030 |
    | Ava      | 15436 |
    +----------+-------+
    
  2. Determine the least popular boys' names in the data:

    bq query --use_legacy_sql=false \
        'SELECT
          name,
          count
        FROM
          `babynames.names2010`
        WHERE
          assigned_sex_at_birth = "M"
        ORDER BY
          count ASC
        LIMIT 5;'
    

    The output is similar to the following:

    +----------+-------+
    |   name   | count |
    +----------+-------+
    | Aamarion |     5 |
    | Aarian   |     5 |
    | Aaqib    |     5 |
    | Aaidan   |     5 |
    | Aadhavan |     5 |
    +----------+-------+
    

    The minimum count is 5 because the source data omits names with fewer than 5 occurrences.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, delete the Google Cloud project with the resources.

Delete the project

If you used the BigQuery sandbox to query the public dataset, then billing is not enabled for your project.

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the resources

If you used an existing project, delete the resources that you created:

  1. Delete the babynames dataset:

    bq rm --recursive=true babynames
    

    The --recursive flag deletes all tables in the dataset, including the names2010 table.

    The output is similar to the following:

    rm: remove dataset 'myproject:babynames'? (y/N)
    
  2. To confirm the delete command, enter y.

What's next