This guide explains important information about the data you see in reports
generated using the Google Analytics Data API. Users often encounter
discrepancies between the data returned by the API (specifically the
runReport method) and the data shown in the Google Analytics UI.
Sampling and aggregation
The Google Analytics Data API runReport method can return sampled data,
especially for large datasets or complex queries. While the
Google Analytics UI also applies sampling, the specific thresholds and
algorithms can differ, leading to minor differences in reported values.
To understand whether the report's results are based on a subset of the
available data, inspect the
samplingMetadatas
field of the ResponseMetaData object. If the report results are sampled,
this field describes the percentage of events used in this report.
Certain reporting methods of the Google Analytics Data API allow you to
specify the sampling level you want. You can use the
samplingLevel
field of the properties.reportTasks.create
method to control the report's sampling level.
This feature gives Google Analytics 360 properties access to higher sampling
limits of 1 billion events. Also, sampling limits can be set to UNSAMPLED
to use unsampled results for large event counts.
For more information, see About data sampling.
Unique count approximation
The Google Analytics Data API uses the HyperLogLog++ (HLL++) algorithm to estimate unique counts for metrics like Active Users and Sessions. This approach is used in the API and the Google Analytics UI to improve performance and manage large datasets efficiently, meaning the results are approximations rather than exact counts.
For additional context, refer to the following resources: Unique count approximation in Google Analytics and Differences in user counts.
Data thresholding
Google Analytics may apply data thresholding to reports to prevent the identification of individual users based on demographics, interests, or other signals present in the data.
If a report row contains a small number of users, that row may be excluded from the results. This is more common in reports that include high-cardinality dimensions or custom dimensions.
To understand whether this report is subject to thresholding, inspect the
subjectToThresholding
field of the ResponseMetaData object.
For more information, see Data thresholds.
The (other) row
If a dimension has high cardinality, Google Analytics may group less-common
values into a row labeled as (other). This is more common in reports
that include dimensions with more than 500 unique values per day.
When using filters with the Data API, note that filters don't look inside
the (other) row, and are applied after data has been aggregated and
the (other) row has been generated.
To understand whether a report contains data rolled into the (other) row,
inspect the
dataLossFromOtherRow
field of the ResponseMetaData object.
For more information, see (other) row in Google Analytics 4.
Reporting identity
Reporting identity decides how users are deduplicated in reports. Different identity settings (such as "Blended" or "Device-based") can result in different user counts for the same date range.
Both the Google Analytics UI and Data API use the same reporting identity setting for your property. If this setting is changed, it will affect reports in both the UI and API. If the setting is changed between running a report in the UI and obtaining the report data through the API, user counts may differ between the two reports for the same date range.
Query specificity
To minimize discrepancies, ensure that the following parameters in your API request precisely match the settings in the Google Analytics UI report:
- Date ranges: Verify that the start and end dates are identical.
- Dimensions and metrics: Ensure that dimensions and metrics in your API request are the same as dimensions and metrics in the Google Analytics UI report.
- Filters: Make sure that any dimension or metric filters applied in the API request match those used in the UI.
Adding dimensions to a report can result in fewer events used in calculations. Only events that contain data for the requested dimensions are included in a report. As a result, adding dimensions to a query can alter the aggregated values for metrics in a report.
Data freshness
Google Analytics takes time to process and aggregate event data. When working with very recent data, you might see minor differences between reports if there is a time delay between data retrievals. For example, if you view a report in the UI and then query the API for the same report minutes later, the data might have changed due to ongoing processing and aggregation.
For more information, see Data Freshness.
Unsampled data alternatives
If your use case requires full, unsampled, event-level data, consider using the following alternatives:
BigQuery Export: BigQuery Export for Google Analytics
is the recommended method for advanced analysis of raw event data.
Analytics 360: Properties with an Analytics 360 license have higher sampling limits and access to more detailed reporting features.