Difference checks

Difference checks help ensure that users can't be identified through the comparison of multiple sufficiently aggregated results in the following ways:

  • They compare results from the job that you're running to your previous results.
  • They compare rows within the same result set.

Difference check violations can be triggered by changes to your underlying data between two jobs. When comparing a job's results to previous results, Ads Data Hub looks for vulnerabilities on the level of individual users. Because of this, even results from different campaigns, or results that report the same number of users, can be filtered if they have a large number of overlapping users.

On the other hand, two aggregated result sets may have the same number of users—appearing identical—but not share individual users, and therefore be privacy-safe, in which case they wouldn't be filtered.

Ads Data Hub uses data from your historical results when considering the vulnerability of a new result. This means that running the same query over and over again creates more data for difference checks to use when considering a new result's vulnerability. Additionally, the underlying data can change, leading to privacy check violations on queries thought to be stable.

When your job-level results differ adequately, but an individual row is similar to a row in a previous job, Ads Data Hub will filter the similar row. In this example, the row containing campaign 123 in the second job's results will be filtered, because it differs from the previous result by a single user.

Job 1
Campaign ID Users
123 400
124 569
Job 2
Campaign ID Users
123 401
224 1325

If the sum of the users in all rows in a result set is similar to that from a previous job, Ads Data Hub will filter the entire result set. In this example, all results from the second job will be filtered.

Job 1
Campaign ID Users
123 400
124 1367
Job 2
Campaign ID Users
123 402
124 1367

Filtered row summary

Rows that aren't adequately aggregated to protect end-user privacy, or don't meet Ads Data Hub's other privacy checks, will be dropped from the results of a given query. You can configure a filtered row summary in analysis queries to return a single row containing constants and sums from dropped rows. This can help prevent discrepancies in your data totals, such as a total impression or total click count. A filtered row summary is enabled by default for new analysis queries created using the UI. Filtered row summaries don't apply to audience queries. Queries created using the API don't have a filtered row summary enabled by default.

Example

In the following example, the rows containing campaigns 124 and 125 would be filtered, as they don't meet the 50-user aggregation requirement.

Campaign ID Users
123 400
124 45
125 6

However, using a filtered row summary, the filtered data can be combined to meet the 50-user aggregation requirement. This example uses a sum to tally filtered data, and a constant to label the row:

  • A sum that aggregates filtered data from the Users column.
  • A constant (used to label the filtered row summary), set to "Filtered row summary" in the Campaign ID column.
Campaign ID Users
Filtered row summary 51
123 400

Automatic mode

We recommend using automatic mode. By default, Ads Data Hub automatically enables and configures a filtered row summary for new queries created using the UI. As you edit the query, Ads Data Hub will automatically adjust the configuration, based on the selected columns.

  • To view the automatically configured settings when viewing a report, go to Properties > Filtered row summary. The configuration is read-only.
  • To disable automatic mode, click the Automatic toggle to the off position. The current configuration is retained and automatic mode is disabled. You must save your query to retain the change.
  • To enable automatic mode, click the Automatic toggle to the on position. You must save your query to retain the change.
  • To toggle automatic mode with the API, use generateFilteredRowSummaryAutomatically.

Manual mode

We recommend using automatic mode. To manually add a filtered row summary to a saved query, or after disabling automatic mode:

  1. Identify the column that you would like to aggregate.
  2. Click Add rule under Properties > Filtered row summary in the right-hand-side of the query editor.
  3. Enter the name of the column you would like to aggregate in the Column name field.
  4. Select Sum from the Type drop-down menu. If the sum falls below aggregation thresholds, the filtered row summary will be dropped.
  5. Optional: Add more rules as you did in the previous steps.
  6. Optional: Add an identifier for the filtered row summary using a constant. Your constant must be a valid data type for your chosen column.
    1. Enter the name of the column in the**Column name field.
    2. Select Constant from the Type drop-down menu.
    3. Enter an identifier for the row in the Specify constant field.
  7. To remove a rule, click delete
  8. To remove all rules, click Remove all.

Manual mode with multiple tables

We recommend using automatic mode. If you are exporting multiple tables using the in-sql syntax introduced in Exporting multiple tables, then you can add a filtered row summary in the column options:

CREATE TABLE `myproject.mydataset.mytable` (
    campaign_id INT64 OPTIONS(merge_constant=0),
    ct INT64 OPTIONS(merge_type="SUM")
  )
OPTIONS(privacy_checked_export=true) AS
SELECT campaign_id, COUNT(*) AS ct GROUP BY campaign_id;
    

This will produce a result table with a filtered row summary where the campaign_id is 0 and ct will be the sum of rows that were filtered from the result due to privacy checks.

The available column options for filtered-row summary are:

  • merge_type="SUM": Outputs the sum of filtered values in the summary row.
  • Valid only for numeric types.
  • merge_constant=<value>: Sets a constant value in the summary row. Must have same type as the column.
  • columns with no specified option will have NULL in the summary row.

Only one of merge_type or merge_constant may be specified. Any rules for filtered row summaries apply here, too (i.e. if the sum falls below the aggregation thresholds, the filtered row summary will be dropped).

Best practices

In addition to general best practices, the following will help reduce privacy check issues for difference checks specifically.

Carefully consider historical results

Best practice: Decrease the likelihood of result set overlap between recently run queries.

Bear in mind that the rate of change between query results will have an effect on how likely it is that results are omitted later on due to privacy checks. A second result set that closely resembles a recently returned result set is likely to be dropped.

Instead, modify key parameters in your query, such as date ranges or campaign IDs, to decrease the likelihood of significant overlap.

Don't query today's data

Best practice: Don't run multiple queries where the end date is today.

Running multiple queries with end dates equal to today will often lead to rows being filtered. This guidance also applies to running queries shortly after midnight on yesterday's data.

Join some final results in BigQuery

Best practice: Avoid Ads Data Hub queries with diff checks that join aggregated results. Instead, write 2 separate queries and join the results in BigQuery.

Rows that don't meet aggregation requirements are filtered from your results. Therefore, if your query joins an insufficiently aggregated row with a sufficiently aggregated row, the resultant row will be filtered. Additionally, queries with multiple aggregations are less performant in Ads Data Hub.

You can join results (in BigQuery) from multiple aggregation queries (from Ads Data Hub). Results computed using common queries will share final schemas.

The following query takes individual Ads Data Hub results (campaign_data_123 and campaign_data_456) and joins them in BigQuery:

SELECT t1.campaign_id, t1.city, t1.X, t2.Y
FROM `campaign_data_123` AS t1
FULL JOIN `campaign_data_456` AS t2
USING (campaign_id, city)

Use filtered row summaries

Best practice: Add filtered row summaries to your queries.

Difference checks with complex query history can cause unexpected data filtering, and a filtered row summary can provide a more consistent total to check result correctness.

For more consistent privacy filtering behavior, try noise injection.