This is legacy documentation, and may not be complete. To see the latest documentation, if you are a marketer, refer to the Marketers site. If you are a measurement partner, refer to the Measurement Partners site.
Stay organized with collections
Save and categorize content based on your preferences.
An external data source is a data source that you can query directly from
BigQuery, even though the data is not stored in
BigQuery storage. For example, you might have data in a
different Google Cloud database, in files in Cloud Storage, or in a
different cloud product altogether that you would like to analyze in
BigQuery, but that you aren't prepared to migrate.
Use cases for external data sources include the following:
For extract-load-transform (ELT) workloads, loading and cleaning your data
in one pass and writing the cleaned result into BigQuery
storage, by using a CREATE TABLE ... AS SELECT query.
Joining BigQuery tables with frequently changing data from
an external data source. By querying the external data source directly, you
don't need to reload the data into BigQuery storage every
time it changes.
As an Ads Data Hub customer, you can leverage this BigQuery feature
to easily bring in first-party data from other sources, such as S3 and Azure,
and join it to Google advertising data in your queries.
The following is a high-level overview of the steps required to export data from
Amazon S3 to BigQuery for use in Ads Data Hub. Refer to
Connect to Amazon S3
for full details.
Create an AWS IAM policy for BigQuery. After the policy is created, the
Amazon Resource Name (ARN) can be found in the Policy details page.
Create an AWS IAM role for BigQuery, using the policy created in the
previous step.
Create a connection in BigQuery. Create a connection in a BigQuery project
that Ads Data Hub has access to—for example, your admin project. The
BigQuery Google identity, which will be used in the next step, is shown in
the Connection info page.
Add a trust relationship to the AWS role. In the AWS IAM page, edit the
role created in the earlier step:
Modify the maximum session duration to 12 hours.
Add a trust policy to the AWS role using the BigQuery Google identity
created in the previous step.
Optional: Schedule
continuous data load in BigQuery.
Azure Blob Storage
The following is a high-level overview of the steps required to export data from
Azure Blob Storage to BigQuery for use in Ads Data Hub. Refer to
Connect to Blob Storage
for full details.
Create an application in your Azure tenant.
Create a connection in BigQuery.
Tenant ID is the directory ID from the previous step.
Federated Application (Client) ID is the Application (client) ID
from the previous step.
BigQuery Google identity will be used the next step.
Add a federated credential in Azure.
For Subject identifier, use the BigQuery Google identity from the
previous step.
Assign a role to BigQuery's Azure applications, granting Storage Blob Data
Reader access.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-09-18 UTC."],[[["\u003cp\u003eBigQuery external data sources allow you to query data directly from sources like AWS S3, Azure Blob Storage, or other cloud databases without storing it in BigQuery.\u003c/p\u003e\n"],["\u003cp\u003eThis feature is useful for ELT processes and joining BigQuery tables with frequently changing external data.\u003c/p\u003e\n"],["\u003cp\u003eAds Data Hub customers can leverage external data sources to join first-party data with Google advertising data for analysis.\u003c/p\u003e\n"],["\u003cp\u003eTo connect to external data sources, you need to configure necessary permissions and connections in both BigQuery and the external data source platform (e.g., AWS, Azure).\u003c/p\u003e\n"],["\u003cp\u003eLimitations include specific supported locations for AWS and Azure, and Ads Data Hub aggregation and policy requirements applying to jobs using external data.\u003c/p\u003e\n"]]],["External data sources enable querying data directly from outside BigQuery, such as other cloud databases or storage, without migration. This supports ELT workloads and joins with frequently changing external data. Ads Data Hub users can integrate first-party data from Amazon S3 or Azure by creating connections in BigQuery. For S3, it involves creating AWS IAM policies and roles, then loading and querying data. For Azure Blob Storage, it requires creating an Azure application, establishing a connection, and assigning data access. Data can then be queried in Ads Data Hub.\n"],null,["# BigQuery external connections\n\nAn external data source is a data source that you can query directly from\nBigQuery, even though the data is not stored in\nBigQuery storage. For example, you might have data in a\ndifferent Google Cloud database, in files in Cloud Storage, or in a\ndifferent cloud product altogether that you would like to analyze in\nBigQuery, but that you aren't prepared to migrate.\n\nUse cases for external data sources include the following:\n\n- For extract-load-transform (ELT) workloads, loading and cleaning your data in one pass and writing the cleaned result into BigQuery storage, by using a `CREATE TABLE ... AS SELECT` query.\n- Joining BigQuery tables with frequently changing data from an external data source. By querying the external data source directly, you don't need to reload the data into BigQuery storage every time it changes.\n\nAs an Ads Data Hub customer, you can leverage this BigQuery feature\nto easily bring in first-party data from other sources, such as S3 and Azure,\nand join it to Google advertising data in your queries.\n\nFor complete details on connecting external data sources to BigQuery, see\n[Introduction to external data sources](https://cloud.google.com/bigquery/docs/external-data-sources#external_data_source_feature_comparison).\n\nLimitations\n-----------\n\n- The following [locations](https://cloud.google.com/bigquery/docs/omni-introduction#locations) are supported. If your AWS or Azure data is in an unsupported region, you could also consider using [BigQuery Data Transfer Service](https://cloud.google.com/bigquery/docs/dts-introduction).\n - AWS - US East (N. Virginia) (`aws-us-east-1`)\n - Azure - East US 2 (`azure-eastus2`)\n- Jobs that are run on data from BigQuery connections:\n - are subject to the same [aggregation requirements](/ads-data-hub/guides/privacy-checks#aggregation_requirements) as other jobs in Ads Data Hub\n - must adhere to Google's [policies](/ads-data-hub/resources/policies)\n\nAmazon S3\n---------\n\nThe following is a high-level overview of the steps required to export data from\nAmazon S3 to BigQuery for use in Ads Data Hub. Refer to\n[Connect to Amazon S3](https://cloud.google.com/bigquery/docs/omni-aws-create-connection)\nfor full details.\n\n1. Create an AWS IAM policy for BigQuery. After the policy is created, the Amazon Resource Name (ARN) can be found in the **Policy details** page.\n2. Create an AWS IAM role for BigQuery, using the policy created in the previous step.\n3. Create a connection in BigQuery. Create a connection in a BigQuery project that Ads Data Hub has access to---for example, your admin project. The BigQuery Google identity, which will be used in the next step, is shown in the **Connection info** page.\n4. Add a trust relationship to the AWS role. In the **AWS IAM** page, edit the role created in the earlier step:\n 1. Modify the maximum session duration to 12 hours.\n 2. Add a trust policy to the AWS role using the BigQuery Google identity created in the previous step.\n5. [Load data](https://cloud.google.com/bigquery/docs/load-data-using-cross-cloud-transfer#load_data) into the BigQuery dataset.\n6. [Query the data](/ads-data-hub/guides/run-queries) in Ads Data Hub. Learn about [joining first-party data](/ads-data-hub/guides/join-your-data).\n7. Optional: [Schedule](https://cloud.google.com/bigquery/docs/scheduling-queries) continuous data load in BigQuery.\n\nAzure Blob Storage\n------------------\n\nThe following is a high-level overview of the steps required to export data from\nAzure Blob Storage to BigQuery for use in Ads Data Hub. Refer to\n[Connect to Blob Storage](https://cloud.google.com/bigquery/docs/omni-azure-create-connection)\nfor full details.\n\n1. Create an application in your Azure tenant.\n2. Create a connection in BigQuery.\n - **Tenant ID** is the directory ID from the previous step.\n - **Federated Application (Client) ID** is the Application (client) ID from the previous step.\n - **BigQuery Google identity** will be used the next step.\n3. Add a federated credential in Azure.\n - For **Subject identifier**, use the BigQuery Google identity from the previous step.\n4. Assign a role to BigQuery's Azure applications, granting Storage Blob Data Reader access.\n5. [Load data](https://cloud.google.com/bigquery/docs/load-data-using-cross-cloud-transfer#load_data) into the BigQuery dataset.\n6. [Query the data](/ads-data-hub/guides/run-queries) in Ads Data Hub. Learn about [joining first-party data](/ads-data-hub/guides/join-your-data).\n7. Optional: [Schedule](https://cloud.google.com/bigquery/docs/scheduling-queries) continuous data load in BigQuery."]]