Join the newly launched Discord community for real-time discussions, peer support, and direct interaction with the Meridian team!

Model debugging

Getting MCMC convergence

Lack of convergence is typically due to one of the following causes:

The model is poorly specified for the data. This problem can be in the likelihood (model specification) or in the prior.
The n_adapt + n_burnin or n_keep arguments of Meridian.sample_posterior are not large enough.

To get your chains to converge, try the following recommendations in this order:

Check for identifiability or weak identifiability using these questions:

Do you have highly multicollinear media or controls variables?
Is the variation in your media or controls variable so small that it is difficult to estimate its effect?
Is one of the media or controls highly correlated with time or even perfectly collinear with time? For more information, see When you must use knots < n_times.
Is one of the media variables quite sparse? Sparsity can mean very little execution in a channel, too many geos with no execution whatsoever, or too many time periods with no media execution whatsoever (especially if the number of knots is close to n_times).

Reassess the priors. Highly uninformative priors often make convergence difficult, but highly informative priors could also make convergence difficult in certain situations.

If your KPI is revenue or if you have revenue per KPI data, consider the advice in ROI priors and calibration for paid media channels.
If you don't have revenue data, consider the advice In Set custom priors when outcome is not revenue for paid media channels. Reducing the total media contribution prior mean and/or standard deviation may help to achieve a sufficient degree of regularization.

Adjust the modeling options. In particular, try decreasing the knots argument of ModelSpec. Other modeling options to adjust include unique_sigma_for_each_geo or media_effects_dist of ModelSpec.
Check for a data error, for example, whether the population order doesn't match media order for geos. Meridian's model assumes a geo hierarchy in media and control effects. If this assumption does not match your data, regularize these parameters further by setting the priors on parameters that measure hierarchical variance (eta_m and xi_c), for example, HalfNormal(0.1). You can also turn off the geo hierarchy assumption with a Deterministic(0) prior.
Consider whether you have enough data. For more information, see Amount of data needed.

When the posterior is the same as the prior

When there are lots of variables that the model is trying to understand, you need more data to understand any particular variable. MMM typically tries to make inference on many variables without that many data points, particularly in the case of a national model. This means that there will be instances where there is little information in the data for a particular media channel. This situation can be exacerbated when a particular channel either has low spend, very low variance in the scaled media execution, or high correlation of scaled media execution between channels. For more information about data amounts, see Amount of data needed. For more information about channels with low spend, see Channels with low spend).

You can make the prior and the posterior different from each other by using increasingly uninformative priors. Recall that the prior represents an estimate of a parameter before the data has been taken into account and the posterior is meant to be an estimate of a parameter after the data has been taken into account. When there is little information in the data, the before and after data are going to be similar. This is particularly true when the prior is relatively informative. Relative refers to the information in the prior relative to the information in the data. This means that the data can always dominate the prior if you set an uninformative enough prior. However, if the prior is uninformative relative to the data, which also has low information in it, then the posterior will be quite wide, representing a lot of uncertainty.

One way to simplify things is to think about the prior you are setting for parameters such as ROI. You don't have to worry too much about the relative informativeness of the prior if you just set reasonable priors that you believe in. If there is little or no information in the data, then it makes sense from a Bayesian perspective that the prior and the posterior are similar. If there is a lot of information in the data, then your prior will likely move based on this data.

Channels with low spend

Channels with low spend are particularly susceptible to have an ROI posterior similar to the ROI prior. Each channel has a range of ROI values that fit the data reasonably well. If this range is wide and completely covers most of the prior probability mass, then the posterior tends to look like the prior. The range of reasonable ROI values for a small spend channel tends to be much wider than that of a high spend channel because small spend channels need very large ROI to have much influence on the model fit. It is more likely that a large range of ROI values will fit the data reasonably well.

Media effects are modeled based on the media metric provided, such as impressions and clicks. Neither the scale of the media metric nor the spend level has any effect on the model fit or the range of incremental outcome that could reasonably be attributed to the channel. ROI is defined as incremental outcome divided by spend, so when the range of reasonable incremental outcome values is translated to an ROI range, a channel with larger spend will have a narrower range of ROI values that fit the data well.

Note: In the case of ordinary least squares regression, the scale of the covariates has no effect on the fit. The scale can matter in a Bayesian regression setting when priors are applied to the coefficients; however, Meridian applies a scaling transformation to each media metric. Scaling a channel's impressions by a factor of 100, for example, does not affect the Meridian model fit.

When ROI results are widely different depending on the prior used

ROI results can be very different depending on whether ROI default priors are used or beta default priors are used.

The use of ROI default priors and beta default priors can affect ROI results for the following reasons:

When default ROI priors are used, each media channel's posterior ROI is regularized towards the same distribution. This is a good thing because every channel is treated equitably.
When default priors on the media coefficients (beta) are used, each media channel's posterior ROI is regularized towards different distributions. This is because the scaling that is done on the media data is not the same scaling used across the channels. So the same beta value means different ROIs for different channels. The default priors on media coefficients are also uninformative relative to the default ROI prior to account for potentially big differences in scaling of the media data across channels.
When there is little information in the data, the prior and the posterior will be similar, as discussed in When the posterior is the same as the prior. When there is little information in the data and beta priors are used, posterior ROIs will be different across the media channels. However, this difference is only coming from the inequitable priors on the media channels and not the data. In summary, it is important to not interpret different ROI results across the channels as a result that is picking up signal from the data, when the difference is only driven by inequitable priors.

ResourceExhaustedError when running `Meridian.sample_posterior`

Meridian.sample_posterior calls tfp.experimental.mcmc.windowed_adaptive_nuts, which can be memory intensive on GPUs when sampling a large number of chains in parallel or when training with large datasets.

One way to reduce the peak GPU memory consumption is to sample chains serially. This capability is provided by passing a list of integers to n_chains. For example, n_chains=[5, 5] will sample a total of 10 chains by calling tfp.experimental.mcmc.windowed_adaptive_nuts consecutively, each time with the argument n_chains=5.

Note that this does come with a runtime cost. Because using this method reduces memory consumption by using consecutive calls to our MCMC sampling method, the total runtime will increase linearly with the length of the list passed to n_chains. For example, n_chains=[5,5] can take up to 2 times as long to run as n_chains=10, and n_chains=[4,3,3] can take up to 3 times as long.

Organic media contribution is too high

If the organic media contribution is higher than expected, the prior used may not be appropriate. Organic media has no defined ROI and, as such, uses the regression coefficient parameterization with the coefficient ( beta_om or beta_orf) prior. If the contribution for organic media is observed to be higher than expected, it is suggested to revisit the priors used for the organic media channels. By default, the priors assumed are relatively uninformative, but do assume to have a positive effect that can result in a high prior mean. When there is little information in the data, this can also lead to a high posterior mean. If this is an issue, you may want to consider using an alternative prior with more mass in the lower end of the range of the distribution. Also, note that when media_effects_dist = 'log_normal', $\beta_i^{[OM]}$ is the prior mean of the log of the geo-level media effect, $\log(\beta_{g,i}^{[OM]})$. The default prior, HalfNormal(5.0), in this case, may be assuming too much prior mass away from zero. This is exacerbated when exponentiating, and you may want to consider a prior with more mass near zero, such as a HalfNormal(0.1) prior. Note, although the variance is small, it still provides a wide range of possible values on the exponentiated scale. Alternatively, for more flexibility, you could consider a Normal prior that allows setting both the location and scale, for example, Normal(0.0, 3.0). Similarly, when media_effects_dist = 'normal', you may want to consider using a prior with a smaller scale than the default, such as HalfNormal(1.0).

Error about controls that don't vary across groups or geos

This error means that you have a national-level variable that doesn't vary across geos and you have set `knots = n_times`. When `knots = n_times`, each time period is getting its own parameter. A national-level variable varies only across time, and not across geo. Therefore, the national-level variable is perfectly collinear with time and is redundant with a model that has a parameter for each time period. Redundant means that you can either keep the national-level variable or set `knots < n_times`. Which variable you choose depends on your interpretation goals.

Negative R-squared

R-squared could be negative on the training set or test set for the following reasons:

The combination of strong priors and data with weak signal (i.e., low signal-to-noise ratio) may result in posterior distributions that are pulled significantly by the priors. This may lead to Meridian model fit on the training set where the Sum of Squared Errors (SSE) is greater than the total Sum of Squares (SST), resulting in a negative R-squared. For more information, see Gelman et. al. (2018).
Negative R-squared on test set occurs when the model fits the training data but fails to generalize to the test set. This could happen to any model. In Meridian, this problem arises if the holdout data is specified to be the last N contiguous time points. For more information, see holdout observations.
Negative R-squared could be caused by data input errors. In particular, incorrect geo-level populations can have a big effect because R-squared is measured on the raw KPI scale, but modeling is done on the per-capita scale. Another common input error is the double population scaling of the KPI. This occurs when one pre-scales the KPI before inputting it into Meridian, which then automatically scales the per-capita KPI a second time.

Refresh the model

Optimizing with Reach and Frequency

Model debugging Stay organized with collections Save and categorize content based on your preferences.