Getting MCMC convergence
Lack of convergence is typically due to one of the following causes:
- The model is poorly specified for the data. This problem can be in the likelihood (model specification) or in the prior.
-
The
n_adapt + n_burnin
orn_keep
arguments ofMeridian.sample_posterior
are not large enough.
To get your chains to converge, try the following recommendations in this order:
- Check for identifiability or weak identifiability using these questions:
-
Do you have highly multicollinear
media
orcontrols
variables? -
Is the variation in your
media
orcontrols
variable so small that it is difficult to estimate its effect? -
Is one of the
media
orcontrols
highly correlated with time or even perfectly collinear with time? For more information, see When you must useknots < n_times
. -
Is one of the
media
variables quite sparse? Sparsity can mean very little execution in a channel, too many geos with no execution whatsoever, or too many time periods with no media execution whatsoever (especially if the number ofknots
is close ton_times
). - Reassess the priors, especially highly informative priors.
-
Adjust the modeling options. In particular, try decreasing the
knots
argument ofModelSpec
. Other modeling options to adjust includeunique_sigma_for_each_geo
ormedia_effects_dist
ofModelSpec
. -
Check for a data error, for example, whether the
population
order doesn't matchmedia
order for geos. Meridian's model assumes a geo hierarchy in media and control effects. If this assumption does not match your data, regularize these parameters further by setting the priors on parameters that measure hierarchical variance (eta_m
andxi_c
), for example,HalfNormal(0.1)
. You can also turn off the geo hierarchy assumption with aDeterministic(0)
prior. - Consider whether you have enough data. For more information, see Amount of data needed.
Baseline is too low or sometimes negative
The baseline is considered to be the model's estimate for the response variable if there was no media execution. If the baseline is too low, there are a few possible causes to investigate:
- The model doesn't have enough high-quality controls, meaning controls that have an affect on both media execution and the response. Consider adding more meaningful controls or population scaling controls where it makes sense to do, such as query volume. Controls are not automatically population-scaled in Meridian.
-
The model is not sufficiently explaining time effects. Consider increasing
knots
. - If your ROI priors are informative, perhaps they are not good priors. Low baseline is equivalent to high ROIs. If you have low information in your data, you might have informative ROI priors and not be aware of it. For more information, see When the posterior is the same as the prior.
- Negative baseline often means that media is getting more incremental credit than is possible. Consider setting the ROI prior in terms of total media contribution, so that the total media contribution has low prior probability of being larger than 100%. For more information, see Set the total media contribution prior.
When the posterior is the same as the prior
When there are lots of variables that the model is trying to understand, you need more data to understand any particular variable. MMM typically tries to make inference on many variables without that many data points, particularly in the case of a national model. This means that there will be instances where there is little information in the data for a particular media channel. This situation can be exacerbated when a particular channel either has low spend, very low variance in the scaled media execution, or high correlation of scaled media execution between channels. For more information about data amounts, see Amount of data needed. For more information about channels with low spend, see Channels with low spend).
You can make the prior and the posterior different from each other by using increasingly uninformative priors. Recall that the prior represents an estimate of a parameter before the data has been taken into account and the posterior is meant to be an estimate of a parameter after the data has been taken into account. When there is little information in the data, the before and after data are going to be similar. This is particularly true when the prior is relatively informative. Relative refers to the information in the prior relative to the information in the data. This means that the data can always dominate the prior if you set an uninformative enough prior. However, if the prior is uninformative relative to the data, which also has low information in it, then the posterior will be quite wide, representing a lot of uncertainty.
One way to simplify things is to think about the prior you are setting for parameters such as ROI. You don't have to worry too much about the relative informativeness of the prior if you just set reasonable priors that you believe in. If there is little or no information in the data, then it makes sense from a Bayesian perspective that the prior and the posterior are similar. If there is a lot of information in the data, then your prior will likely move based on this data.
Channels with low spend
Channels with low spend are particularly susceptible to have an ROI posterior similar to the ROI prior. Each channel has a range of ROI values that fit the data reasonably well. If this range is wide and completely covers most of the prior probability mass, then the posterior tends to look like the prior. The range of reasonable ROI values for a small spend channel tends to be much wider than that of a high spend channel because small spend channels need very large ROI to have much influence on the model fit. It is more likely that a large range of ROI values will fit the data reasonably well.
Media effects are modeled based on the media metric provided, such as impressions and clicks. Neither the scale of the media metric nor the spend level has any effect on the model fit or the range of incremental KPI units that could reasonably be attributed to the channel. ROI is defined as incremental KPI units divided by spend, so when the range of reasonable incremental KPI units values is translated to an ROI range, a channel with larger spend will have a narrower range of ROI values that fit the data well.
Note: In the case of ordinary least squares regression, the scale of the covariates has no effect on the fit. The scale can matter in a Bayesian regression setting when priors are applied to the coefficients; however, Meridian applies a scaling transformation to each media metric. Scaling a channel's impressions by a factor of 100, for example, does not affect the Meridian model fit.
When ROI results are widely different depending on the prior used
ROI results can be very different depending on whether ROI default priors are used or beta default priors are used.
The use of ROI default priors and beta default priors can affect ROI results for the following reasons:
- When default ROI priors are used, each media channel's posterior ROI is regularized towards the same distribution. This is a good thing because every channel is treated equitably.
- When default priors on the media coefficients (beta) are used, each media channel's posterior ROI is regularized towards different distributions. This is because the scaling that is done on the media data is not the same scaling used across the channels. So the same beta value means different ROIs for different channels. The default priors on media coefficients are also uninformative relative to the default ROI prior to account for potentially big differences in scaling of the media data across channels.
- When there is little information in the data, the prior and the posterior will be similar, as discussed in When the posterior is the same as the prior. When there is little information in the data and beta priors are used, posterior ROIs will be different across the media channels. However, this difference is only coming from the inequitable priors on the media channels and not the data. In summary, it is important to not interpret different ROI results across the channels as a result that is picking up signal from the data, when the difference is only driven by inequitable priors.
ResourceExhaustedError when running Meridian.sample_posterior
Meridian.sample_posterior
calls
tfp.experimental.mcmc.windowed_adaptive_nuts
, which can be memory
intensive on GPUs when sampling a large number of chains in parallel or when
training with large datasets.
One way to reduce the peak GPU memory consumption is to sample chains
serially. This capability is provided by passing a list of integers to
n_chains
. For example, n_chains=[5, 5]
will sample a
total of 10 chains by calling
tfp.experimental.mcmc.windowed_adaptive_nuts
consecutively, each
time with the argument n_chains=5
.
Note that this does come with a runtime cost. Because using this method
reduces memory consumption by using consecutive calls to our MCMC sampling
method, the total runtime will increase linearly with the length of the list
passed to n_chains
. For example, n_chains=[5,5]
can
take up to 2 times as long to run as n_chains=10
, and
n_chains=[4,3,3]
can take up to 3 times as long.