BigQuery external connections

An external data source is a data source that you can query directly from BigQuery, even though the data is not stored in BigQuery storage. For example, you might have data in a different Google Cloud database, in files in Cloud Storage, or in a different cloud product altogether that you would like to analyze in BigQuery, but that you aren't prepared to migrate.

Use cases for external data sources include the following:

  • For extract-load-transform (ELT) workloads, loading and cleaning your data in one pass and writing the cleaned result into BigQuery storage, by using a CREATE TABLE ... AS SELECT query.
  • Joining BigQuery tables with frequently changing data from an external data source. By querying the external data source directly, you don't need to reload the data into BigQuery storage every time it changes.

As an Ads Data Hub customer, you can leverage this BigQuery feature to easily bring in first-party data from other sources, such as S3 and Azure, and join it to Google advertising data in your queries.

For complete details on connecting external data sources to BigQuery, see Introduction to external data sources.

Limitations

  • The following locations are supported. If your AWS or Azure data is in an unsupported region, you could also consider using BigQuery Data Transfer Service.
    • AWS - US East (N. Virginia) (aws-us-east-1)
    • Azure - East US 2 (azure-eastus2)
  • Jobs that are run on data from BigQuery connections:

Amazon S3

The following is a high-level overview of the steps required to export data from Amazon S3 to BigQuery for use in Ads Data Hub. Refer to Connect to Amazon S3 for full details.

  1. Create an AWS IAM policy for BigQuery. After the policy is created, the Amazon Resource Name (ARN) can be found in the Policy details page.
  2. Create an AWS IAM role for BigQuery, using the policy created in the previous step.
  3. Create a connection in BigQuery. Create a connection in a BigQuery project that Ads Data Hub has access to—for example, your admin project. The BigQuery Google identity, which will be used in the next step, is shown in the Connection info page.
  4. Add a trust relationship to the AWS role. In the AWS IAM page, edit the role created in the earlier step:
    1. Modify the maximum session duration to 12 hours.
    2. Add a trust policy to the AWS role using the BigQuery Google identity created in the previous step.
  5. Load data into the BigQuery dataset.
  6. Query the data in Ads Data Hub. Learn about joining first-party data.
  7. Optional: Schedule continuous data load in BigQuery.

Azure Blob Storage

The following is a high-level overview of the steps required to export data from Azure Blob Storage to BigQuery for use in Ads Data Hub. Refer to Connect to Blob Storage for full details.

  1. Create an application in your Azure tenant.
  2. Create a connection in BigQuery.
    • Tenant ID is the directory ID from the previous step.
    • Federated Application (Client) ID is the Application (client) ID from the previous step.
    • BigQuery Google identity will be used the next step.
  3. Add a federated credential in Azure.
    • For Subject identifier, use the BigQuery Google identity from the previous step.
  4. Assign a role to BigQuery's Azure applications, granting Storage Blob Data Reader access.
  5. Load data into the BigQuery dataset.
  6. Query the data in Ads Data Hub. Learn about joining first-party data.
  7. Optional: Schedule continuous data load in BigQuery.