Meridian uses a Bayesian regression model, which combines prior knowledge with signals learned from data to estimate media effects and quantify uncertainty. Prior knowledge is incorporated into the model using prior distributions, which can be informed by experiment data, industry experience, or previous media mix models.
Bayesian Markov Chain Monte Carlo (MCMC) sampling methods are used to jointly estimate all model coefficients and parameters. This includes parameters of the nonlinear media transformation functions, such as Adstock and diminishing returns curves. All parameters and corresponding uncertainty are taken into consideration when calculating point estimates and credible intervals for ROI and other key insights.
Bayes' theorem
Bayes' theorem tells how you can use observable data to make inferences about unobservable parameters, which can be expressed as the following equation:
Where:
- \(\theta\) is the unobservable parameter of interest
- \(P(\theta|data)\) is the posterior, and is the output of the Bayesian equation
- \(P(data|\theta)\) is the likelihood
- \(P(\theta)\) is the prior
The likelihood and prior must be specified to perform inference on the posterior.
Likelihood, priors, and posteriors
The likelihood is the model specification. It is a distribution that specifies the probability of the data values given the model's parameter values \(\theta\). After the Bayesian analysis is performed, inferences and estimates are made on the parameters \(\theta\). Likelihoods can have a wide range of complexity. Meridian's likelihood is based on a hierarchical regression model. For more information about Meridian likelihood, see Model specification.
A prior represents the belief about the probability distribution of a parameter before data has been taken into account. Incorporating prior knowledge is required for the Bayesian approach of quantifying uncertainty. In Meridian, the prior distribution represents the beliefs about the effects of marketing channels before the data is seen. Informative priors express a high certainty in \(\theta\), which requires a large amount of data evidence to overcome the belief. An uninformative prior is an expression of very little idea of what the value of \(\theta\) is, and so the prior has little influence. The Meridian model provides well-reasoned priors with default values. You can customize the priors, such as for ROI calibration.
The posterior is a distribution that represents the strength of the belief in the different possible values of \(\theta\) after the data has been taken into account. The posterior is based on the prior, the data, and the likelihood according to Bayes' Theorem. If there is little information in the data, the posterior is weighted more towards the priors. If there is extensive information in the data, the posterior is weighted more towards the data.
The Meridian model generates the joint posterior distribution for all model parameters, as well as every metric that is estimated, such as ROI, mROI, and response curves. The posterior distribution represents the updated beliefs about the effects of marketing channels, given the observed data.
MCMC convergence
Using Markov Chain Monte Carlo (MCMC), the posterior sampling converges to a target distribution. Model convergence can be assessed by running multiple MCMC chains and checking that all chains reach the same target distribution.
Meridian uses the MCMC No U-Turn Sampler (NUTS) sampling method. Parameter values are drawn from a probability distribution where the distribution of the current value depends on the values of the previous iteration. The values form a chain, where each iteration is a complete set of model parameter values. Multiple chains are run independently to assess convergence. When convergence is reached, each chain represents a sample from the target posterior distribution. The chains can then be merged for posterior inference.
It is critical that you examine R-hat values to assess MCMC convergence. These values are provided as part of the model output. We recommend obtaining an R-hat less than 1.1 for all parameters, although this is not a strict threshold. If R-hat values are slightly larger than 1.1, convergence is usually achievable by running longer chains. If R-hat values are much larger (such as 2.0 or greater), it might be possible to obtain convergence by running longer chains. However, computational time and memory constraints can be prohibitive, so it might be necessary to adjust the model to obtain convergence.