Input data

Define the following index variables:

  • \(g=1,\ldots,G\) indexes the geographical units.
  • \(t=1,\ldots,T\) indexes the time units.

    For paid and organic media variables, data for time periods \(t<1\) can be included in the model input data to accurately model lagged effects in the earlier time periods. If data for \(t<1\) is not provided, it is assumed that there is no media execution prior to \(t=1\).

  • \(i=1,\ldots,N_C\) indexes the control variables

  • \(i=1,\ldots,N_N\) indexes the non-media treatments

  • \(i=1,\ldots,N_M\) indexes the paid media channels without reach and frequency data

  • \(i=1,\ldots, N_{OM}\) indexes the organic media channels without reach and frequency data

  • \(i=1,\ldots,N_{RF}\) indexes the paid media channels with reach and frequency data

  • \(i=1,\ldots, N_{ORF}\) indexes the organic media channels with reach and frequency data

Meridian requires two main data arrays as model inputs (KPI and media). Organic media and non-media treatments can be also provided as optional inputs if they are available. For paid and organic media channels with reach and frequency data available by geo and time period, the reach and frequency data can optionally be used instead of a single media metric. You also have the option to include controls that are confounding variables or strong predictors of the KPI. A unit value is required for KPI and media data so that units can be converted to a currency value for ROI calculations.

Data Dimensions Model Input: Raw Units Model Input: Unit Value Transformed Units (used in model equation) Value/Cost
KPI $$G \times T$$ $$\overset{\cdot \cdot}{y}_{g,t}$$ $$u^{[Y]}_{g,t}$$ $$y_{g,t} = L^{[Y]}_{g,t} (\overset{\cdot \cdot}{y}_{g,t})$$ $$\overset{\sim}y_{g,t} = u^{[Y]}_{g,t} \cdot \overset{\cdot \cdot}{y}_{g,t}$$
Controls $$G \times T \times N_C$$ $$\overset{\cdot \cdot}{z}_{g,t,i}$$ $$\text{N/A}$$ $$z_{g,t,i} = L^{[C]}_{g,i}(\overset{\cdot \cdot}{z}_{g,t,i})$$ $$\text{N/A}$$
Media $$G \times T \times N_M$$ $$\overset{\cdot \cdot}{x}^{[M]}_{g,t,i}$$ $$u^{[M]}_{g,t,i}$$ $$x^{[M]}_{g,t,i} = L^{[M]}_{g,i}(\overset{\cdot \cdot}{x}^{[M]}_{g,t,i})$$ $$\overset{\sim}x_{g,t,i}^{[M]} = u^{[M]}_{g,t,i}\cdot\overset{\cdot \cdot}{x}^{[M]}_{g,t,i}$$
Reach $$G \times T \times N_{RF}$$ $$\overset{\cdot \cdot}{r}^{[RF]}_{g,t,i}$$ $$u^{[RF]}_{g,t,i}$$ $$r_{g,t,i} = L^{[RF]}_{g,i}(\overset{\cdot \cdot}{r}^{[RF]}_{g,t,i})$$ $$\overset{\sim}r^{[RF]}_{g,t,i} = u^{[RF]}_{g,t,i} \cdot \overset{\cdot \cdot}{r}^{[RF]}_{g,t,i} \cdot f^{[RF]}_{g,t,i}$$
Frequency $$G \times T \times N_{RF}$$ $$f^{[RF]}_{g,t,i}$$ $$\text{N/A}$$
Organic Media $$G \times T \times N_{OM}$$ $$\overset{\cdot \cdot}{x}^{[OM]}_{g,t,i}$$ $$u^{[OM]}_{g,t,i}$$ $$x^{[OM]}_{g,t,i} = L^{[OM]}_{g,i}(\overset{\cdot \cdot}{x}^{[OM]}_{g,t,i})$$ $$\overset{\sim}x^{[OM]}_{g,t,i} = u^{[OM]}_{g,t,i}\cdot\overset{\cdot \cdot}{x}^{[OM]}_{g,t,i}$$
Organic Reach $$G \times T \times N_{ORF}$$ $$\overset{\cdot \cdot}{r}^{[ORF]}_{g,t,i}$$ $$u^{[ORF]}_{g,t,i}$$ $$r^{[ORF]}_{g,t,i} = L^{[ORF]}_{g,i}(\overset{\cdot \cdot}{r}^{[ORF]}_{g,t,i})$$ $$\overset{\sim}r^{[ORF]}_{g,t,i} = u^{[ORF]}_{g,t,i} \cdot \overset{\cdot \cdot}{r}^{[ORF]}_{g,t,i} \cdot f^{[ORF]}_{g,t,i}$$
Organic Frequency $$G \times T \times N_{ORF}$$ $$f^{[ORF]}_{g,t,i}$$ $$\text{N/A}$$
Non-media Treatments $$G \times T \times N_N$$ $$\overset{\cdot \cdot}{x}^{[N]}_{g,t,i}$$ $$\text{N/A}$$ $$x^{N}_{g,t,i} = L^{N}_{g,i}(\overset{\cdot \cdot}{x}^{N}_{g,t,i})$$ $$\text{N/A}$$

Unit transformations are handled internally by Meridian. Geo population scaling is necessary for hierarchical modeling to put all geos on a comparable scale. Other standardization is done so that standardized prior distributions can be used, without the need to consider the scale of each variable.

Define \(p_g\) to be the population size of each geo, which is another model input that must be specified by the user. The linear transformations are summarized as follows:

Transformation: KPI units

Notation: \(L^{[Y]}_{g,t} (\cdot)\)

Description:

  1. Divide by geo population.
  2. Center and scale the geo-scaled values to have mean zero and standard deviation one.

Definition:

\(L^{[Y]}_{g,t} (q) = \dfrac{\dfrac{q}{p_g} - m^{[Y]}}{s^{[Y]}}\)

Where:

  • \(y^\dagger_{g,t} = \dfrac{\overset {\cdot \cdot} y_{g,t}}{p_g}\)
  • \(m^{[Y]} = \frac{1}{GT}\sum\limits_{g,t} y^\dagger_{g,t}\)
  • \(s^{[Y]} = \sqrt{\frac{1}{GT-1} \sum\limits_{g,t} \left( y^\dagger_{g,t}-m^{[Y]} \right)^2}\)

Transformation: Control variables

Notation: \(L^{[C]}_{g,i} (\cdot)\)

Description:

  1. It might make sense to do population scaling for certain controls. This can be handled using the control_population_scaling_id argument. By default, no controls are population scaled.

  2. Center and scale each control variable to have mean zero and standard deviation one.

Definition:

\(L^{[C]}_{g,i}(q) = \dfrac{\dfrac{q}{p^{I^{[C]}_i}_g} - m^{[C]}}{s^{[C]}}\)

Where:

  • \(I_i^{[C]} = 1\) if control_population_scaling_id=True is used for the variable \(i;0\) otherwise.

    • \(z^{\dagger}_{g,t,i} = \dfrac{\overset {\cdot \cdot} z_{g,t,i}}{p_g^{I_i^{[C]}}}\)
    • \(m^{[C]} = \frac{1}{GT}\sum\limits_{g,t} z^{\dagger}_{g,t,i}\)
    • \(s^{[C]} = \sqrt{\frac{1}{GT-1} \sum\limits_{g,t} \left( z^{\dagger}_{g,t,i}-m^{[C]} \right)^2}\)

Transformation: Media units

Notation: \(L^{[M]}_{g,i} (\cdot)\)

Description:

  1. Divide by geo population.
  2. For each media channel, scale the geo-scaled values by the median non-zero value.

Definition:

\(L^{[M]}_{g,i} (q) = \dfrac{q}{p_g d^{[M]}}\)

Where:

  • \(x^{\dagger [M]}_{g,t,i} = \dfrac{\overset {\cdot \cdot} x_{g,t,i}^{[M]}}{p_g}\)
  • \(d^{[M]} = \text{Median}\left( \left\{ x^{\dagger [M]}_{g,t,i}:x^{\dagger [M]}_{g,t,i} > 0 \right\}_{g,t} \right)\)

Transformation: Reach

Notation: \(L^{[RF]}_{g,i} (\cdot)\)

Description:

The transformation function is the same as for media units.

Transformation: Organic media units

Notation: \(L^{[OM]}_{g,i} (\cdot)\)

Description:

  1. Divide by geo population.
  2. For each organic media channel, scale the geo-scaled values by the median non-zero value.

Definition:

\(L^{[OM]}_{g,i} (q) = \dfrac{q}{p_g d^{[OM]}}\)

Where:

  • \(x^{\dagger [OM]}_{g,t,i} = \dfrac{\overset {\cdot \cdot} x_{g,t,i}^{[OM]}}{p_g}\)
  • \(d^{[OM]} = \text{Median}\left( \left\{ x^{\dagger [OM]}_{g,t,i}:x^{\dagger [OM]}_{g,t,i} > 0 \right\}_{g,t} \right)\)

Transformation: Organic reach

Notation: \(L^{[ORF]}_{g,i} (\cdot)\)

Description:

The transformation function is the same as for organic media units.

Transformation: Non-media treatments

Notation: \(L^{[N]}_{g,i} (\cdot)\)

Description:

  1. It might make sense to do population scaling for certain non-media treatments. This can be handled using the non_media_population_scaling_id argument. By default, non-media treatments are not population scaled.

  2. Center and scale each non-media treatment variable to have mean zero and standard deviation one.

Definition:

\(L^{[N]}_{g,i}(q) = \dfrac{\dfrac{q}{p^{I^{[N]}_i}_g} - m^{[N]}}{s^{[N]}}\)

Where:

  • \(I_i^{[N]} = 1\) if non_media_population_scaling_id=True is used for the variable \(i;0\) otherwise.

    • \(X^{\dagger [N]}_{g,t,i} = \dfrac{\overset {\cdot \cdot} x_{g,t,i}}{p_g^{I_i^{[N]}}}\)
    • \(m^{[N]} = \frac{1}{GT}\sum\limits_{g,t} x^{\dagger [N]}_{g,t,i}\)
    • \(s^{[N]} = \sqrt{\frac{1}{GT-1} \sum\limits_{g,t} \left( x^{\dagger [N]}_{g,t,i}-m^{[N]} \right)^2}\)