[UA] About data sampling

This article is about data sampling in Universal Analytics. For information about data sampling in Google Analytics 4, go to [GA4] About data sampling.

In data analysis, sampling is the practice of analyzing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acres.

This article explains the circumstances under which Analytics applies session sampling to your data in order to give you accurate reports in a timely fashion.

 

In this article:

Sampling thresholds

Default reports are not subject to sampling.

Ad-hoc queries of your data are subject to the following general thresholds for sampling:

  • Analytics Standard: 500k sessions at the property level for the date range you are using
  • Analytics 360: 100M sessions at the view level for the date range you are using
    • Queries may include events, custom variables, and custom dimensions and metrics. All other queries have a threshold of 1M

    • Historical data is limited to up to 14 months (on a rolling basis)

In some circumstances, you may see fewer sessions sampled. This can result from the complexity of your Analytics implementation, the use of view filters, query complexity for segmentation, or some combination of those factors. Although we make a best effort to sample up to the thresholds described above, it's normal to sometimes see slightly fewer sessions returned for an ad-hoc query.

When sampling is applied

The following sections explain where you can expect session sampling in Analytics reports.

Default reports

Analytics has a set of preconfigured, default reports listed in the left pane under Audience, Acquisition, Behavior, and Conversions.

Analytics stores one complete, unfiltered set of data for each property in each account. For each reporting view in a property, Analytics also creates tables of aggregated dimensions and metrics from the complete, unfiltered data. When you run a default report, Analytics queries the tables of aggregated data to quickly deliver unsampled results.

Analytics periodically adds new reports, and sometimes makes changes to the way metrics are calculated. If the date range of a report includes a time before the report was added or before a metric calculation changed, then Analytics can issue an ad-hoc query, and the data might be sampled.

Data is sampled when reports that include the Users and Active Users metrics include data from before September 2016. Learn more

Default reports are unsampled in both Analytics Standard and Analytics 360. However, if you use the auto-tagging override feature, you may experience sampling in some of your Google Ads reports.

Ad-hoc reports

If you modify a default report in some way—for example, by applying a segment, filter, or secondary dimension—or if you create a custom report with a combination of dimensions and metrics that don’t exist in a default report, you are generating an ad-hoc query of Analytics data.

Analytics first goes to the aggregated data tables to see if all of the requested information from your ad-hoc query is available there. If the information is not available there, Analytics queries the complete, unfiltered set of data to satisfy the query request.

Ad-hoc queries are subject to sampling if the number of sessions for the date range you are using exceeds the threshold for your property type.

The sampling algorithm uses a sample of the complete data that is proportional to the daily distribution of sessions for the property for the date range you’re using. For example, if over a 5-day period, sessions were sampled at 25%, then the sample would include 25% of each day’s sessions:

  Monday Tuesday Wednesday Thursday Friday
Total sessions 200,000 100,000 200,000 300,000 200,000
25% sample 50,000 25,000 50,000 75,000 50,000

 

The sampling rate varies from query to query depending on the number of sessions during a date range for a given view.

When sampling is in effect, you see a message at the top of the report that says This report is based on N% of sessions.

To the right of that message, you can select one of two options to change the sampling size:

  • Greater precision: Uses the maximum sample size possible to give you results that are the most precise representation of your full data set
  • Faster response: Uses a smaller sampling size to give you faster results
Sampling controls: Greater precision or Faster response
Sampling controls.

Other reports

Sampling works differently for these reports than for default reports or ad-hoc queries.

Multi-Channel Funnels reports

Like default reports, no sampling is applied unless you modify the report—for example, by changing the lookback window, by changing which conversions are included, or by adding a segment or secondary dimension. If you modify the report in any way, a maximum sample of 1M conversions will be returned.

Flow-visualization reports

Flow-visualization reports (Users Flow, Behavior Flow, Events Flow, Goal Flow) are generated from a maximum of 100K sessions for the selected date range.

The flow-visualization reports, including entrance, exit, and conversion rates, may differ from the results in the default Behavior and Conversions reports, which are based on a different sample set.

Filters and segments

Analytics Standard and Analytics 360 sample session data at the view level, after view filters have been applied. For example, if view filters include or exclude sessions, then the sample is taken from only those sessions.

Analytics Standard and Analytics 360 both apply segments after applying report filters and after sampling, which means that a segment may include fewer sessions than are included in the overall sample.

Working with sample size

Use the controls to switch between the maximum sample size for a more precise report, or the smaller sample size for a faster response to your query.

One option to avoid sampling is to shorten the date range of your report until the number of sessions is under the sampling threshold, if your volume of data allows for that.

If you are a Google Analytics 360 user, you have 2 additional options to get unsampled reports:

Was this helpful?

How can we improve it?
true
Choose your own learning path

Check out google.com/analytics/learn, a new resource to help you get the most out of Google Analytics 4. The new website includes videos, articles, and guided flows, and provides links to the Google Analytics Discord, Blog, YouTube channel, and GitHub repository.

Start learning today!

Search
Clear search
Close search
Google apps
Main menu
14811936797138502496
true
Search Help Center
true
true
true
true
true
69256
false
false