Relevance APIs testing guidance

Before you begin

  • This publication should be reviewed alongside the CMA testing guidance: Experiments note (November 2022), Testing guidance (June 2023) and Additional testing guidance (October 2023).
  • The goal of this doc is to provide market participants with guidance on the Relevance API use cases, configuration, experiment structure, goals, and evaluation best practices for market participants.

Experiment design principles

Make sure that test and control arms are well defined and measure the same variables.

User randomization & assignment

  • If not solely using Chrome-facilitated experiment groups (for example, running the experiment on other traffic), ensure that the test and control split of users is randomized and unbiased. Regardless of experiment group setup, evaluate characteristics of test and control arms to ensure test/control groups are comparable. See: Section 15.
  • Ensure that user characteristics of test and control groups are the same (for example, use similar geos in both test and control groups). See: Section 28.
  • Using the CMA's suggested experiment design, include 3 experiment groups: treatment (PS APIs + other signals) + control 1 (no 3PCs + other signals) + control 2 (no 3PCs + no PS APIs + other signals). See: Sections 10-14.
  • Ensure your setup makes as full use of the Chrome-facilitated testing labels as possible.
  • Bear in mind that Chrome plans to exclude some Chrome instances from these experiment groups for legal, UX, and / or technical reasons. This includes Chrome Enterprise and others. These exclusions will be identical for Mode A and Mode B and thus across Control 1, Control 2 and Treatment. As a consequence, you should not compare metrics obtained on experiment groups with metrics obtained outside these groups.

Configuration alignment across test & control groups

  • Ensure that test and control groups are using comparable campaign configuration, including similar inventory, ad formats, campaign types, and campaign settings. See: Section 28.
  • Specific examples include: ensuring similar conversion types are being measured using the same attribution window and same attribution logic, the campaigns are targeting similar audiences, interest groups, geos and are using similar ad copy and ad formats. See: Section 28.
  • Ideally, the DSP can manage reporting and not have to rely on a 3P company that might add an additional attribution layer. If the DSP's customer is using a 3P company to perform attribution, that 3P should be included in the integration/test plans and in both test and control arms.
  • Ensure that each campaign administered by a participating DSP has an equal chance of participating in auctions within the treatment and control groups, and that their bidding behavior in each group is not affected by the presence of the other groups. See: Section 25.

Metric collection & evaluation

  • Before assessing results, ensure that the difference in the results between test and control groups is statistically significant. See: Section 25.
  • For all metrics, ensure that outliers are assessed. Particularly for sales or Return on Advertiser Spend (ROAS) metrics, single-touch attribution logic is very prone to outliers, which can inadvertently drastically affect results one way or the other. See: Appendix.Table 2.
  • If using non-3PC methods to measure conversions (for example, link decoration, first-party data, other contextual data), ensure they are used both in test and control groups. See: Section 13 & 14.
  • Consider using two different experiment groups, one to measure the Relevance APIs and another to measure the Attribution Reporting API. This avoids multivariate testing and makes it easier to understand the cause of your observations. For more details, review the Measurement testing guide and see the table for recommended measurement methods by experiment arm.
Treatment vs Control 1
Compares proposed end state with current state.
Treatment vs Control 2
Compares proposed end state with no PS APIs at all.
Control 2 vs Control 1
Compares conversion measurement with and without 3PCs, without any PS APIs.
Measurement methodology To avoid multivariate testing, use ARA and non-3PC data to measure conversion-based metrics for both arms. To avoid multivariate testing, use only non-3PC data to measure conversion-based metrics for both arms. To avoid multivariate testing, use only non-3PC data to measure conversion-based metrics for both arms.

Implementation guidance

This section provides guidance on common use cases for relevance APIs, as well as the minimum or optimal configurations for setting up the APIs. It is important to understand which use cases are important for your business and ensure that your configuration aligns with the minimum requirements before moving forward with the experiment setup section.

Use cases

We have listed some common use cases within the ad relevance space. Note that use cases may involve the use of multiple relevance APIs, depending on the needs of individual market participants.

Brand awareness or prospecting

  • Minimal
    • Use Topics signals semantically as audience segments.
    • Use Protected Audience API to create audience segments by adding visitors of a web page to an interest group representing the contextual category of that site.
  • Optimal
    • Use Topics alongside other private, durable signals such as first party and contextual data as features in machine learning models, in order to infer audience segments for users.
    • Use Protected Audience API to create audience segments by adding visitors of a web page to an interest group based on first-party data, specific user activity, topics or other contextual signals.
    • Use Protected Audience API to create first-party audience segments that can be offered for audience extension to increase advertiser campaign reach.

Remarketing

  • Use Protected Audience API to create customized remarketing segments for a site by creating interest groups dependent on user activity.

Configuration

Topics API

  • Minimum
    • (DSP) Ad tech uses topics as signals in ad selection aligned with the published Maximize ad relevance without third-party cookies guidance. Note that these Topics signals could result from the DSP calling the Topics API themselves, from their processing Topics signals provided by a partner SSP, or both.
    • (SSP) Ad tech works with a percentage of its publishers to include topics returned by the Topics API in the bidstream as per ORTB 2.x spec, which requires the ad tech to work with sites to upgrade and deploy any required libraries (such as header bidding dependencies) to be able to call the Topics API on those sites.
    • (SSP) Ad tech calls the Topics API on all traffic where the API is available for use by the ad tech. Because of the per-caller filtering requirement, the API needs to be called for at least 3 weeks to ensure maximum availability. To help with this, it is appropriate to begin calling the Topics API even before DSP partners are ready to begin using the signal.
  • Optimal
    • (DSP) Call the Topics API at meaningful consumer journey milestones where the API is available for use by the ad tech. Use this data for ML training, for example, by associating topics with relevant first-party and attribution data.
    • (DSP) Explore and introduce topics-based features into ML targeting models to enhance audience segmentation. Run model inference at targeting time to expand possible user topics. Match inferred topics with advertiser campaigns targeting those audience segments.
    • (DSP) Explore and introduce topics-based features into ML bidding models to enhance predicted Click-Through Rate (CTR) and Conversion Rate (CVR) models.

Protected Audience

  • (DSP) Ad tech implements all client-side dependencies to participate in Protected Audience API based auctions.
    • This includes dependencies that are core to the Protected Audience API (such as the bidding logic JavaScript) as well as any integration modules and reporting endpoints.
  • (DSP) Ad tech ready to participate in billable Protected Audience API based auctions in Q1-Q2 2024.
    • This will require the ad tech to identify and coordinate with SSP testing partners and sites to align on goals, integration handshakes, and timelines to run end-to-end testing.
  • (DSP) Ad tech is using a key-value server to retrieve real-time signals for generating bids.
    • This will require the ad tech to identify key use-cases that require real-time signals, such as halting campaigns that have spent their whole budget, and incorporate these real-time signals in their bidding logic JavaScript.
  • (DSP) Ad tech has a high-level approach for performance in Protected Audience auctions.
    • This high-level approach broadly consists of three aspects: measuring performance, analyzing data, and iteratively refining the implementation to improve performance.
  • (SSP) Ad tech ready to participate in billable Protected Audience auctions in Q1-Q2 2024.
    • This includes implementing and deploying dependencies that are core to the Protected Audience API (such as the decision logic JavaScript) as well as any integration or orchestration modules, adapters (such as header bidding dependencies) and reporting endpoints.
  • (SSP) Ad tech is using a key-value server to retrieve real-time signals for scoring ads.
    • This will require the ad tech to identify key use-cases that require real-time signals, such as meeting ad quality requirements, and incorporate these real-time signals in their decision logic JavaScript.
  • (SSP) Ad tech has a high-level approach for managing scale and traffic volume requirements for production deployments.
    • This will require the ad tech to coordinate with its DSP testing partners to source their intent to participate in specific Protected Audience API based auctions and source signals

Evaluation goals and proposed experiment setup

For this evaluation, we recommend using the Chrome-facilitated testing Mode A and B labels.

Proposed experiment setup

  • Use the suggested experiment design by the CMA, including 3 groups: (See: Sections 10-14)
    • Control 1: Mode A traffic, control_1.* groups (3PCs + Privacy Sandbox APIs + Other Signals)
    • Control 2: Mode B control_2 group (no 3PCs + no Privacy Sandbox APIs + Other Signals).
    • Treatment: Mode B treatment_1.* groups (no 3PCs, with Privacy Sandbox APIs + Other SIgnals)
  • We understand that market participants use a variety of other signals aside from 3PCs to assign ads to ad requests, for example first party publisher data and contextual information. To the extent that these signals are not impacted by the proposed changes (the deprecation of 3PCs and the introduction of the Privacy Sandbox APIs), these should be retained in both the control and treatment groups.
  • Note that there could be some 3PCs still available to some sites. Testers should not use those 3PCs for relevance use cases in the Control 2 or Treatment arms. Those 3PCs are designed to address non-ads breakage.
  • Note that the Privacy Sandbox APIs will be available in control 1 but, as outlined in the CMA's guidance on industry testing, testing participants should not use Topics API or run Protected Audiences auctions for this traffic.
  • See Chrome-facilitated testing for the labels, sizes and characteristics of each group.

Proposed metric methodology

  • We do not propose any specific methodology for collecting or calculating metrics. However, we strongly encourage testers to provide transparency into said methodology. This will enable those analyzing test results (that is, regulators) provided by different companies to be informed of variance and make decisions about how to compare between results.
  • We recognize that the collection of some metric may require the use of Private Aggregation, which some testers may not yet be integrated with at the time of testing.
  • To the extent possible, participants could report metrics related to the bidding activity and amount of interest drawn by auctions, such as the number of bids that made it to auctions or the average bid value.
Experiment Group Treatment Control 2 Control 1
Traffic Mode Mode B (3PCD) Mode B' (3PCD + PA + Topics suppression) Status Quo (w/o 3PCD)
Relevance API applied PA+Topics +contextual signals Contextual signals only Cookie-based signals + contextual signals

Goal 1 - (DSP) Measuring impact of 3PCD on interest-based advertising

Metrics

Include key metrics requested by the CMA, and, where feasible, consider including others metrics that may be useful, such as:

  • Average number of topics received: The number of topics received in bid requests. It's a measure of Topic API coverage.
  • Average time spent: The average time that at least 50% of the ads pixels were present in the users viewport, after the ad has begun to render. It's a measure of the ad engagement.
  • Clicks per impression (aka click through rate): The average number of clicks per impression. A measure of ad relevance.
  • Clicks per dollar: The average number of clicks per dollar. It's a measure of ad quality received by advertisers.
  • Conversions per dollar: The average number of conversions per dollar. It's a measure of ad quality received by advertisers.
  • Conversion rate: The average number of conversions per click, shown as a percentage. It's a measure of traffic quality received by advertisers.
  • Total unique bid responses: The total number of bid responses sent by the DSP. This is a proxy of demand for individual ad techs' services.
  • Unique viewers: The number of unique users reached by the advertisers ad. It's a measure of reach.
  • Video completion rates: The average time that at least 50% of the ads pixels were present in the users viewport for the entire length of the video ad, after the video has rendered and begun to play. It's a measure of the ad engagement.

Suggested points of analysis

  • Are advertisers able to reach their target audience, at the preferred scale?
  • How is ad engagement and interaction affected by the changes?
  • How is the ad quality received by advertisers affected by the change?
  • Is the advertiser able to advertise in a cost effective manner? In other words, are they able to acquire new customers at a rate that is lower than the value generated by those customers over some reasonable period of time?
  • Are there any notable caveats that one should consider when evaluating the results of a market participant's experiment?
  • How do the results compare to previous tests conducted by the market participant, if any?
  • What are the potential headwinds impacting performance? Conversely, what are the factors driving performance?

Goal 2 - (DSP) Measuring impact of 3PCD on remarketing

Metrics

Include key metrics requested by the CMA, and, where feasible, consider including others metrics that may be useful, such as:

  • Average time spent: The average time that at least 50% of the ads pixels were present in the users viewport, after the ad has begun to render. It's a measure of the ad engagement.
  • Clicks per impression (aka click through rate): The average number of clicks per impression. A measure of ad relevance.
  • Clicks per dollar: The average number of clicks per dollar. It's a measure of ad quality received by advertisers.
  • Conversions per dollar: The average number of conversions per dollar. It's a measure of ad quality received by advertisers.
  • Conversion rate: The average number of conversions per click, shown as a percentage. It's a measure of traffic quality received by advertisers.
  • Total unique bid responses: The total number of bid responses sent by the DSP. This is a proxy of demand for individual ad techs' services.
  • Unique viewers: The number of unique users reached by the advertisers ad. It's a measure of reach.
  • Video completion rates: The average time that at least 50% of the ads pixels were present in the users viewport for the entire length of the video ad, after the video has rendered and begun to play. It's a measure of the ad engagement.

Suggested points of analysis

  • Are advertisers able to reach their target audience, at the preferred scale?
  • How is ad engagement and interaction affected by the changes?
  • How is the ad quality received by advertisers affected by the change?
  • Is the advertiser able to advertise in a cost effective manner? In other words, are they able to acquire new customers at a rate that is lower than the value generated by those customers over some reasonable period of time?
  • Are there any notable caveats that one should consider when evaluating the results of a market participant's experiment?
  • How do the results compare to previous tests conducted by the market participant, if any?
  • What are the potential headwinds impacting performance? Conversely, what are the factors driving performance?

Goal 3 - (SSP) Measuring impact of 3PCD on auctions not-facilitated by Protected Audience API

Metrics

Include key metrics requested by the CMA, and others metrics that may be useful, such as:

  • % change in planned campaign spent: Advertisers spend on campaigns. A measure of ad tech and publishers' share of revenue.
  • Average number of topics sent: The average number of topics sent in bid requests. It's a measure of Topic API coverage.
  • Auction latency: The average time for an auction to run. For SSPs, measured from first execution of SSP client side code, to when the SSP selects a bid and sends it to the ad server. For ad servers, measured from ad tag execution to rendered ad. It's a measure of how quick the auction executes.
  • Revenue per impression: The average revenue generated per impression. A measure of publisher revenue.
  • Total unique bid requests: The total number of unique bid requests sent by the SSP. This is a proxy of demand for individual ad techs' services.

Suggested points of analysis

  • How do changes in latency affect an SSP's ability to execute an auction? How do they affect the publisher's page latency?
  • How is publisher revenue affected by latency? By the number of topics sent in bid requests?
  • How is demand for the SSPs ad tech services affected?
  • Are SSPs able to report on the auction-related KPIs that are important to the publisher's businesses? For their own businesses?
  • Are there any notable caveats that one should consider when evaluating the results of a market participant's experiment?
  • How do the results compare to previous tests conducted by the market participant, if any?
  • What are the potential headwinds impacting performance? Conversely, what are the factors driving performance?

Goal 4 - (SSP) Determining efficacy of Protected Audience API auctions

Metrics

Include key metrics requested by the CMA, and others metrics that may be useful, such as:

  • Auction latency: The average time for an auction to run. For SSPs, measured from first execution of SSP client side code, to report result. For ad servers, measured from ad tag execution to rendered ad. It's a measure of how quick the auction executes.
  • Revenue per impression: The average revenue generated per impression. A measure of publisher revenue.
  • Fill rate: The fill rate measures the percentage of bid requests that an SSP successfully fills with ads. To calculate, divide the total ad impressions by the total number of bid requests from a SSP.
  • Time Out rate: % of Auctions that don't complete due to reaching the SSP-configurable timeouts.
  • Total unique bid requests: The total number of unique bid requests sent by the SSP. This is a proxy of demand for individual ad techs' services.

Suggested points of analysis

  • How do changes in latency affect an SSP's ability to execute an auction? How do they affect the publisher's page latency?
  • How did timeouts affect DSPs ability to bid?
  • How is publisher revenue affected by latency? By the number of topics sent in bid requests?
  • How is demand for the SSPs ad tech services affected?
  • Are SSPs able to report on the auction-related KPIs that are important to the publisher's businesses? For their own businesses?
  • Are there any notable caveats that one should consider when evaluating the results of a market participant's experiment?
  • How do the results compare to previous tests conducted by the market participant, if any?
  • What are the potential headwinds impacting performance? Conversely, what are the factors driving performance?