Page Summary
-
This guide explains how to integrate Google Earth Engine and BigQuery for geospatial analysis, covering data transfer, performance, limitations, and costs.
-
BigQuery is a scalable data warehouse for structured and geospatial vector data, while Earth Engine is optimized for raster data and advanced geospatial processing.
-
Earth Engine data can be exported to BigQuery using the
Export.table.toBigQuery()function for further analysis and integration. -
Data can be loaded directly from BigQuery into Earth Engine as a FeatureCollection using
ee.FeatureCollection.loadBigQueryTable()(currently in private preview). -
SQL queries can be executed directly against BigQuery tables from Earth Engine, with results returned as a FeatureCollection, using
ee.FeatureCollection.runBigQuery()(currently in private preview).
This document provides guidance on integrating Google Earth Engine (EE) with BigQuery (BQ) for efficient geospatial analysis. It covers data transfer between the two platforms, performance considerations, limitations, and cost implications.
What is BigQuery?
BigQuery is Google's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It excels at high-speed SQL queries and supports geospatial data.
When should I use BigQuery or Earth Engine?
| BigQuery | Earth Engine | |
|---|---|---|
| Data type | Primarily structured, analytical
vector data. Supports geospatial
operations on GEOGRAPHY data. |
Primarily raster data, but also supports vector data. |
| Processing | SQL-based queries, optimized for large-scale aggregations and joins. Deep integration with BQ Machine Learning. | JavaScript and Python APIs for raster processing and analysis, including advanced geospatial algorithms and machine learning. |
| Scale | Petabyte-scale storage and analysis. Focus on tabular data processing. | Petabyte-scale storage and analysis. Focus on geospatial analysis, with limitations on vector data processing scale. |
| Use cases | Data warehousing, business intelligence, geospatial analytics on large vector datasets. | Geospatial analysis, remote sensing, environmental monitoring, machine learning on raster data. |
Vector and raster data
Vector data are points, lines, and polygons on the surface of the earth. In
BigQuery, vector data is stored using the GEOGRAPHY data type; in Earth
Engine, these are ee.Geometry objects.
Raster data are projected grids of pixels. Earth Engine is optimized for handling and processing large raster datasets.
Benefits of using both systems
BigQuery's scalability lets you to work with massive tabular datasets that may be challenging to process solely within Earth Engine, and Earth Engine can power large-scale enrichment and vector-to-raster processing that aren't possible in BigQuery.
Earth Engine supports a wider range of geospatial functions and data than BigQuery does, but BigQuery has much richer integration with other tools and services.
Move Earth Engine data to BigQuery
Earth Engine can export data directly to BigQuery for further analysis and integration with other datasets.
Export.table.toBigQuery()
Use the Export.table.toBigQuery() function to trigger asynchronous export jobs
that write the results of Earth Engine computation to BigQuery. These Earth
Engine tasks can be viewed and controlled from the Task view in the Cloud
Console or in the Earth
Engine Code Editor.
JavaScript
// Define an Earth Engine feature collection. var features = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017'); // Export the feature collection to BigQuery. Export.table.toBigQuery({ collection: features, description: 'export_to_bigquery', table: 'my_project.my_dataset.my_table', append: true, overwrite: false });
Python
# Define an Earth Engine feature collection. features = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017') # Export the feature collection to BigQuery. task = ee.batch.Export.table.toBigQuery( collection=features, description='export_to_bigquery', table='my_project.my_dataset.my_table', append=True, overwrite=False ) task.start()
See the complete function documentation for more information.
Synchronous APIs
No direct connector exists within Earth Engine to synchronously write data directly to BigQuery. You can use the BigQuery client library for your preferred language (Python, Java, Go, etc.) to stream data into BigQuery, or use the RPC Storage API for real-time or near real-time data transfer.
| Topic | Details |
|---|---|
| Prerequisites | The project must have the BigQuery API and BigQuery Storage API enabled. |
| Permissions | You need to have write access on the target BigQuery dataset and permission to create jobs in the target project. See the list of necessary permissions for specifics. Refer to the BigQuery access control documentation for detailed information on managing permissions. |
| Pricing | You will incur charges for your usage of BigQuery, including storage and analysis of any Earth Engine data that you export to BigQuery. For details, see the Earth Engine to BigQuery export pricing. |
| Limits | The resulting data must fit BigQuery's table model with an additional limit of 8 MB per row. See also the set of known issues for exporting Earth Engine data to BigQuery. |
Load data from BigQuery directly
The ee.FeatureCollection.loadBigQueryTable() function loads data directly from
the BigQuery table without transforming it within BigQuery.
JavaScript
// Load the BigQuery table with a specified geometry column. var features = ee.FeatureCollection.loadBigQueryTable({ tablePath: 'my_project.my_dataset.my_table', geometryColumn: 'geo' }); // Map features on the map Map.addLayer(features);
Python
# Load the BigQuery table with a specified geometry column. features = ee.FeatureCollection.loadBigQueryTable( tablePath='my_project.my_dataset.my_table', geometryColumn='geo') print(features.first())
| Topic | Details |
|---|---|
| Prerequisites | The project must have the BigQuery API and BigQuery Storage API enabled. |
| Permissions | In addition to the standard roles and permissions, you need to have read access on the target BigQuery table and permission to create read sessions in the target project. The specific BigQuery permissions required are: - bigquery.tables.get - bigquery.tables.getData - bigquery.readSession.create - bigquery.jobs.create Refer to the BigQuery access control documentation for detailed information on managing permissions. |
| Pricing | If you're using a project registered for commercial Earth Engine use, there are no additional Earth Engine costs associated with reading a BigQuery table, but you will incur EECU-time costs related to processing the data in Earth Engine. The exact pricing scheme is determined by your Earth Engine plan. If you're using a project registered for noncommercial use, you won't incur any charges for reading data from BigQuery into Earth Engine. Even though you won't pay for BigQuery usage, BigQuery as a commercial product requires your project to have an associated billing account. To learn more about billing accounts refer to the documentation to enable, disable, or change billing. NOTE: In Private Preview you may also be charged for BigQuery data streaming. |
| Limits | Feature collection filters are sent to
BigQuery and applied there. BigQuery has a size
limit for received filter clauses. If you see
a "filter too big" error message, consider
simplifying the filter. One common source of
hitting this limit may be a complicated geometry
used in the .filterBounds() call in Earth Engine. |
Run BigQuery queries from Earth Engine
The ee.FeatureCollection.runBigQuery() function triggers BigQuery computation
to evaluate a SQL query directly against BigQuery tables and retrieve the
results as an Earth Engine FeatureCollection.
JavaScript
// Construct a BigQuery query. var query = 'SELECT * FROM my_project.my_dataset.my_table WHERE area > 1000'; // Run the query and retrieve the results as a FeatureCollection. var features = ee.FeatureCollection.runBigQuery(query); // Print the first feature. print(features.first());
Python
# Construct a BigQuery query. query = 'SELECT * FROM my_project.my_dataset.my_table WHERE area > 1000' # Run the query and retrieve the results as a FeatureCollection. features = ee.FeatureCollection.runBigQuery(query) # Print the first feature. print(features.first())
Costs
A billing account is necessary to use this function.
Earth Engine
If you're using a project registered for commercial Earth Engine use, there are no additional Earth Engine costs associated with running a BigQuery table, but you will incur EECU-time costs related to processing the data in Earth Engine. The exact pricing scheme is determined by your Earth Engine plan.
If you're using a project registered for noncommercial use, you won't incur any charges for reading data from BigQuery into Earth Engine.
BigQuery
This method incurs BigQuery charges based on your pricing plan (on-demand or with slots). If you have a BigQuery reservation, using this method won't generate additional on-demand charges.
| Topic | Details |
|---|---|
| Prerequisites | A billing account is necessary to use this function, and the BigQuery API must be enabled. |
| Permissions | Ensure the Earth Engine service account has
the necessary permissions to use both BigQuery and Earth Engine.
The user credentials used to authenticate need to have the
bigquery.jobs.create and bigquery.jobs.get permissions.
Refer to the
BigQuery access control documentation
for detailed information on managing permissions. |
| Pricing | This method incurs BigQuery charges based on your pricing plan (on-demand or with slots). If you have a BigQuery reservation, using this method won't generate additional on-demand charges. If you're using a project registered for commercial Earth Engine use, there are no additional Earth Engine costs associated with running a BigQuery table, but you will incur EECU-time costs related to processing the data in Earth Engine. The exact pricing scheme is determined by your Earth Engine plan. If you're using a project registered for noncommercial use, you won't incur any charges for reading data from BigQuery into Earth Engine. |
Performance
In addition to query optimization in BigQuery, the performance of returning the results to Earth Engine is a function of the size of the results (that is, the amount of bytes read from BigQuery) and the complexity of the resulting geometries.
Filters
Best practices for filtering include:
- filtering early and often: Apply filters as early in the data processing pipeline as possible, ideally within your BigQuery SQL queries. This reduces the amount of data transferred and processed by Earth Engine.
- Increase selectivity: Craft filters that select a smaller, more specific subset of data. Avoid overly broad filters that retrieve unnecessary data.
- Combine filters: Use multiple filter conditions in conjunction to narrow down the results effectively.
- Simplify: When possible, break down complex filters into simpler conditions to avoid exceeding the 1 MB limit for filter string size.
- Clustering: Geospatial queries can be more performant over clustered columns.
- Prefer filtering in BigQuery: To reduce data size, it's always better to perform initial filtering in BigQuery before further processing in Earth Engine. When that's not possible, apply additional filters as post-filters within your Earth Engine script after loading data from BigQuery.
System-wide limits
- 400 GB table size limit
- Earth Engine doesn't allow reading tables larger than 400 GB.
- 10 GB intermediate table sizes
- Each query result is limited to 10 GB. Make your query more selective by
choosing only the necessary columns, for example by adding
LIMITandWHEREclauses. - Earth Engine imposes a limit on the amount of data scanned in BigQuery.
- Each query result is limited to 10 GB. Make your query more selective by
choosing only the necessary columns, for example by adding
- 1 MB limit for filter string size after transformation
- Complex filters can result in large filter strings exceeding the limit.
- Timeouts
- Earth Engine applies timeouts to queries, which can vary depending on the realm (e.g., batch, online).
- BigQuery limits
- Refer to the BigQuery quota documentation for details on query job limitations.
Cost controls
Moving data between BigQuery and Earth Engine typically does not incur direct costs. However, the size of the data transferred can influence processing costs in both BigQuery and Earth Engine.
If a query involves processing on the BigQuery side, it will incur BigQuery charges according to your BigQuery configuration.
If a query involves processing within Earth Engine and your project is registered for commercial use, you will incur Earth Engine EECU-time charges according to your Earth Engine billing configuration.
Logs and debugging
Read operations
When reading data from BigQuery using
ee.FeatureCollection.loadBigQueryTable(), the corresponding operation is not
explicitly logged as a job in BigQuery. This means that you may see other
logging details (like Cloud audit logs) that have no corresponding BigQuery read
jobs.
Query history
Queries executed using ee.FeatureCollection.runBigQuery() are logged in your
project's BigQuery query history. You can access your query history through the
BigQuery UI in the Cloud console.