CrUX on BigQuery

Learn how CrUX data is structured on BigQuery.

Introduction

The raw data behind the Chrome UX Report (CrUX) is available on BigQuery, a database hosted on the Google Cloud Platform (GCP).

CrUX on BigQuery allows users to directly query the full dataset going back to 2017, for example to analyze trends, compare web technologies and benchmark domains.

The data is structured by monthly release, as well as a number of summary tables to provide simple access for querying the data. These are documented further below.

The BigQuery data is the basis of the CrUX Dashboard, which allows you to visualize this data without writing SQL queries.

Accessing the dataset in GCP

Using BigQuery requires a GCP project and basic knowledge of SQL. The CrUX dataset on BigQuery is free to access and explore up to the limits of the free tier, which is renewed monthly and provided by BigQuery. Additionally, new GCP users may be eligible for a signup credit to cover expenses beyond the free tier. Note that a credit card must be provided for the GCP project, see Why do I need to provide a credit card?.

If this is your first time using BigQuery then follow below steps to set up a project:

  1. Navigate to Google Cloud Platform.
  2. Click Create a Project.
  3. Give your new project a name like "My Chrome UX Report" and click Create.
  4. Provide your billing information if prompted.
  5. Navigate to the CrUX dataset on BigQuery

Now you're ready to start querying the dataset.

Project organization

CrUX data on BigQuery is released on the second Tuesday of the following month. Each month is released as a new table under chrome-ux-report.all. There are also a number of materialized tables which provide summary statistics for each month.

Detailed table schema

Raw tables

The raw tables for each country and the all dataset have the following schema:

  • origin
  • effective_connection_type
  • form_factor
  • first_paint
  • first_contentful_paint
  • largest_contentful_paint
  • dom_content_loaded
  • onload
  • first_input
    • delay
  • layout_instability
    • cumulative_layout_shift`
  • interaction_to_next_paint
  • experimental
    • permission
      • notifications
    • time_to_first_byte
    • popularity

Materialized table schema

Materialized tables are provided for easy access to summary data by a number of key dimensions. No histograms are provided, instead performance data is aggregated into fractions by performance assessment and the 75th percentile value. A set of example rows from the metrics_summary table are shown below as an example:

yyyymm origin fast_lcp avg_lcp slow_lcp p75_lcp
202204 https://example.com 0.9056 0.0635 0.0301 1600
202203 https://example.com 0.9209 0.052 0.0274 1400
202202 https://example.com 0.9169 0.0545 0.0284 1500
202201 https://example.com 0.9072 0.0626 0.0298 1500

This shows that in the 202204 dataset, 90.56% of real-user experiences on https://example.com met the criteria for good LCP, and that the coarse 75th percentile LCP value was 1,600ms. This is slightly slower than previous months.

Four materialized tables are provided:

metrics_summary
key metrics by month and origin
device_summary
key metrics by month, origin and device type
country_summary
key metrics by month, origin, device type and country
origin_summary
a list of all origins included in the dataset

metrics_summary

The metrics_summary table contains summary statistics for each origin and each monthly dataset:

yyyymm
Month of the data collection period
origin
URL of the site origin
rank
Coarse popularity ranking (as of March 2021)
[small|medium|large]_cls
fraction of traffic by CLS thresholds
[fast|avg|slow]_<metric>
fraction of traffic by performance thresholds
p75_<metric>
75th percentile value of performance metrics (milliseconds)
notification_permission_[accept|deny|ignore|dismiss]
fraction of notification permission behaviors
[desktop|phone|tablet]Density
fraction of traffic by form factor
[_4G|_3G|_2G|slow2G|offline]Density
fraction of traffic by effective connection type

device_summary

The device_summary table contains aggregated statistics by month, origin, country and device. In addition to the metrics_summary columns there is:

device
Device form factor

country_summary

The country_summary table contains aggregated statistics by month, origin, country and device. In addition to the metrics_summary columns there is:

country_code
Two-letter country code
device
Device form factor

origin_summary

The origin_summary table contains a list of all origins in the CrUX dataset; it is updated monthly with the latest list of origins in the dataset and has a single column: origin.

Experimental dataset

Tables in the experimental dataset are exact copies of the default YYYYMM tables, but they make use of newer and more advanced BigQuery features like partitioning and clustering that enable you to write faster, simpler, and cheaper queries.

country

The experimental.country dataset contains aggregated data from the country_CC datasets with an additional yyyymm column for the dataset date. The schema is identical to raw tables with the addition of the date and country_code columns, allowing for country-level comparison over time queries to be executed without joining the monthly tables.

global

The experimental.global dataset contains aggregated data from the all dataset with an additional yyyymm column for the dataset date. The schema is identical to raw tables with the addition of the date, allowing for comparison over time queries to be executed without joining the monthly tables.