Earth Engine and BigQuery

  • This guide explains how to integrate Google Earth Engine and BigQuery for geospatial analysis, covering data transfer, performance, limitations, and costs.

  • BigQuery is a scalable data warehouse for structured and geospatial vector data, while Earth Engine is optimized for raster data and advanced geospatial processing.

  • Earth Engine data can be exported to BigQuery using the Export.table.toBigQuery() function for further analysis and integration.

  • Data can be loaded directly from BigQuery into Earth Engine as a FeatureCollection using ee.FeatureCollection.loadBigQueryTable() (currently in private preview).

  • SQL queries can be executed directly against BigQuery tables from Earth Engine, with results returned as a FeatureCollection, using ee.FeatureCollection.runBigQuery() (currently in private preview).

This document provides guidance on integrating Google Earth Engine (EE) with BigQuery (BQ) for efficient geospatial analysis. It covers data transfer between the two platforms, performance considerations, limitations, and cost implications.

What is BigQuery?

BigQuery is Google's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It excels at high-speed SQL queries and supports geospatial data.

When should I use BigQuery or Earth Engine?

BigQuery Earth Engine
Data type Primarily structured, analytical vector data. Supports geospatial operations on GEOGRAPHY data. Primarily raster data, but also supports vector data.
Processing SQL-based queries, optimized for large-scale aggregations and joins. Deep integration with BQ Machine Learning. JavaScript and Python APIs for raster processing and analysis, including advanced geospatial algorithms and machine learning.
Scale Petabyte-scale storage and analysis. Focus on tabular data processing. Petabyte-scale storage and analysis. Focus on geospatial analysis, with limitations on vector data processing scale.
Use cases Data warehousing, business intelligence, geospatial analytics on large vector datasets. Geospatial analysis, remote sensing, environmental monitoring, machine learning on raster data.

Vector and raster data

Vector data are points, lines, and polygons on the surface of the earth. In BigQuery, vector data is stored using the GEOGRAPHY data type; in Earth Engine, these are ee.Geometry objects.

Raster data are projected grids of pixels. Earth Engine is optimized for handling and processing large raster datasets.

Benefits of using both systems

BigQuery's scalability lets you to work with massive tabular datasets that may be challenging to process solely within Earth Engine, and Earth Engine can power large-scale enrichment and vector-to-raster processing that aren't possible in BigQuery.

Earth Engine supports a wider range of geospatial functions and data than BigQuery does, but BigQuery has much richer integration with other tools and services.

Move Earth Engine data to BigQuery

Earth Engine can export data directly to BigQuery for further analysis and integration with other datasets.

Export.table.toBigQuery()

Use the Export.table.toBigQuery() function to trigger asynchronous export jobs that write the results of Earth Engine computation to BigQuery. These Earth Engine tasks can be viewed and controlled from the Task view in the Cloud Console or in the Earth Engine Code Editor.

JavaScript

// Define an Earth Engine feature collection.
var features = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017');

// Export the feature collection to BigQuery.
Export.table.toBigQuery({
  collection: features,
  description: 'export_to_bigquery',
  table: 'my_project.my_dataset.my_table',
  append: true,
  overwrite: false
});
      

Python

# Define an Earth Engine feature collection.
features = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017')

# Export the feature collection to BigQuery.
task = ee.batch.Export.table.toBigQuery(
  collection=features,
  description='export_to_bigquery',
  table='my_project.my_dataset.my_table',
  append=True,
  overwrite=False
)
task.start()
      

See the complete function documentation for more information.

Synchronous APIs

No direct connector exists within Earth Engine to synchronously write data directly to BigQuery. You can use the BigQuery client library for your preferred language (Python, Java, Go, etc.) to stream data into BigQuery, or use the RPC Storage API for real-time or near real-time data transfer.

Topic Details
Prerequisites The project must have the BigQuery API and BigQuery Storage API enabled.
Permissions You need to have write access on the target BigQuery dataset and permission to create jobs in the target project. See the list of necessary permissions for specifics. Refer to the BigQuery access control documentation for detailed information on managing permissions.
Pricing You will incur charges for your usage of BigQuery, including storage and analysis of any Earth Engine data that you export to BigQuery. For details, see the Earth Engine to BigQuery export pricing.
Limits The resulting data must fit BigQuery's table model with an additional limit of 8 MB per row. See also the set of known issues for exporting Earth Engine data to BigQuery.

Load data from BigQuery directly

The ee.FeatureCollection.loadBigQueryTable() function loads data directly from the BigQuery table without transforming it within BigQuery.

JavaScript

// Load the BigQuery table with a specified geometry column.
var features = ee.FeatureCollection.loadBigQueryTable({
  tablePath: 'my_project.my_dataset.my_table',
  geometryColumn: 'geo'
});

// Map features on the map
Map.addLayer(features);
      

Python

# Load the BigQuery table with a specified geometry column.
features = ee.FeatureCollection.loadBigQueryTable(
    tablePath='my_project.my_dataset.my_table',
    geometryColumn='geo')

print(features.first())
      
Topic Details
Prerequisites The project must have the BigQuery API and BigQuery Storage API enabled.
Permissions In addition to the standard roles and permissions, you need to have read access on the target BigQuery table and permission to create read sessions in the target project. The specific BigQuery permissions required are: - bigquery.tables.get - bigquery.tables.getData - bigquery.readSession.create - bigquery.jobs.create Refer to the BigQuery access control documentation for detailed information on managing permissions.
Pricing If you're using a project registered for commercial Earth Engine use, there are no additional Earth Engine costs associated with reading a BigQuery table, but you will incur EECU-time costs related to processing the data in Earth Engine. The exact pricing scheme is determined by your Earth Engine plan. If you're using a project registered for noncommercial use, you won't incur any charges for reading data from BigQuery into Earth Engine. Even though you won't pay for BigQuery usage, BigQuery as a commercial product requires your project to have an associated billing account. To learn more about billing accounts refer to the documentation to enable, disable, or change billing. NOTE: In Private Preview you may also be charged for BigQuery data streaming.
Limits Feature collection filters are sent to BigQuery and applied there. BigQuery has a size limit for received filter clauses. If you see a "filter too big" error message, consider simplifying the filter. One common source of hitting this limit may be a complicated geometry used in the .filterBounds() call in Earth Engine.

Run BigQuery queries from Earth Engine

The ee.FeatureCollection.runBigQuery() function triggers BigQuery computation to evaluate a SQL query directly against BigQuery tables and retrieve the results as an Earth Engine FeatureCollection.

JavaScript

// Construct a BigQuery query.
var query = 'SELECT * FROM my_project.my_dataset.my_table WHERE area > 1000';

// Run the query and retrieve the results as a FeatureCollection.
var features = ee.FeatureCollection.runBigQuery(query);

// Print the first feature.
print(features.first());
      

Python

# Construct a BigQuery query.
query = 'SELECT * FROM my_project.my_dataset.my_table WHERE area > 1000'

# Run the query and retrieve the results as a FeatureCollection.
features = ee.FeatureCollection.runBigQuery(query)

# Print the first feature.
print(features.first())
      

Costs

A billing account is necessary to use this function.

Earth Engine

If you're using a project registered for commercial Earth Engine use, there are no additional Earth Engine costs associated with running a BigQuery table, but you will incur EECU-time costs related to processing the data in Earth Engine. The exact pricing scheme is determined by your Earth Engine plan.

If you're using a project registered for noncommercial use, you won't incur any charges for reading data from BigQuery into Earth Engine.

BigQuery

This method incurs BigQuery charges based on your pricing plan (on-demand or with slots). If you have a BigQuery reservation, using this method won't generate additional on-demand charges.

Topic Details
Prerequisites A billing account is necessary to use this function, and the BigQuery API must be enabled.
Permissions Ensure the Earth Engine service account has the necessary permissions to use both BigQuery and Earth Engine. The user credentials used to authenticate need to have the bigquery.jobs.create and bigquery.jobs.get permissions. Refer to the BigQuery access control documentation for detailed information on managing permissions.
Pricing This method incurs BigQuery charges based on your pricing plan (on-demand or with slots). If you have a BigQuery reservation, using this method won't generate additional on-demand charges. If you're using a project registered for commercial Earth Engine use, there are no additional Earth Engine costs associated with running a BigQuery table, but you will incur EECU-time costs related to processing the data in Earth Engine. The exact pricing scheme is determined by your Earth Engine plan. If you're using a project registered for noncommercial use, you won't incur any charges for reading data from BigQuery into Earth Engine.

Performance

In addition to query optimization in BigQuery, the performance of returning the results to Earth Engine is a function of the size of the results (that is, the amount of bytes read from BigQuery) and the complexity of the resulting geometries.

Filters

Best practices for filtering include:

  • filtering early and often: Apply filters as early in the data processing pipeline as possible, ideally within your BigQuery SQL queries. This reduces the amount of data transferred and processed by Earth Engine.
  • Increase selectivity: Craft filters that select a smaller, more specific subset of data. Avoid overly broad filters that retrieve unnecessary data.
  • Combine filters: Use multiple filter conditions in conjunction to narrow down the results effectively.
  • Simplify: When possible, break down complex filters into simpler conditions to avoid exceeding the 1 MB limit for filter string size.
  • Clustering: Geospatial queries can be more performant over clustered columns.
  • Prefer filtering in BigQuery: To reduce data size, it's always better to perform initial filtering in BigQuery before further processing in Earth Engine. When that's not possible, apply additional filters as post-filters within your Earth Engine script after loading data from BigQuery.

System-wide limits

  • 400 GB table size limit
    • Earth Engine doesn't allow reading tables larger than 400 GB.
  • 10 GB intermediate table sizes
    • Each query result is limited to 10 GB. Make your query more selective by choosing only the necessary columns, for example by adding LIMIT and WHERE clauses.
    • Earth Engine imposes a limit on the amount of data scanned in BigQuery.
  • 1 MB limit for filter string size after transformation
    • Complex filters can result in large filter strings exceeding the limit.
  • Timeouts
    • Earth Engine applies timeouts to queries, which can vary depending on the realm (e.g., batch, online).
  • BigQuery limits

Cost controls

Moving data between BigQuery and Earth Engine typically does not incur direct costs. However, the size of the data transferred can influence processing costs in both BigQuery and Earth Engine.

If a query involves processing on the BigQuery side, it will incur BigQuery charges according to your BigQuery configuration.

If a query involves processing within Earth Engine and your project is registered for commercial use, you will incur Earth Engine EECU-time charges according to your Earth Engine billing configuration.

Logs and debugging

Read operations

When reading data from BigQuery using ee.FeatureCollection.loadBigQueryTable(), the corresponding operation is not explicitly logged as a job in BigQuery. This means that you may see other logging details (like Cloud audit logs) that have no corresponding BigQuery read jobs.

Query history

Queries executed using ee.FeatureCollection.runBigQuery() are logged in your project's BigQuery query history. You can access your query history through the BigQuery UI in the Cloud console.