This document provides additional guidance and considerations for advanced modeling configuration and edge cases.
Model Fit
Assessing model fit
The primary goal of marketing mix modeling (MMM) is the accurate estimation of causal marketing effects. However, directly validating the quality of causal inference is difficult and requires welldesigned experiments. These experiments must be executed correctly and must have the same estimand as the MMM. Since you are using an MMM, experiments are likely not practical. For this reason, the causal inference cannot be directly assessed. Instead, you have to rely on indirect measures.
We recommend that you make modeling decisions that make sense for the goal of causal inference, and not for minimizing prediction error. Consider these guidelines:
 Make sure that your control variable set includes all important confounding variables, which impact both media execution and the response. For more information, see Selecting control variables.
 Be careful about including control variables that are not actually confounders. Too many variables can increase the risk of overfitting and model misspecification bias.
 Only add media variables for which you are interested in learning the causal inference.
 Model time using the advice in Choosing the number of knots for time effects in the model, and don't necessarily try to model time with as many knots as you can.
This process does require some selfreflection from you as the advertiser, however this will most likely lead to the best model fit. Considering that you planned your own media strategy, you probably know or have a strong sense of what variables impacted your planning around media execution.
The results need to make sense. Results that don't make sense include unusually low baselines that are often negative, and one media channel dominating all other media channels. Meridian has outofsample prediction metrics which are useful as a preliminary check to make sure the model structure is appropriate and not extremely overparameterized. For more information, see About outofsample prediction metrics.
About outofsample prediction metrics
The goal in marketing mix modeling (MMM) is causal inference, and not necessarily to minimize outofsample prediction metrics. It can be safer to have a model that includes all confounding variables and allows enough flexibility in the model structure to get unbiased, causal estimates (such as ROI), even if this means the model is overfit.
It is still a good idea to check the outofsample fit to make sure your model
structure is appropriate and not extremely overparameterized, but the
outofsample prediction metrics shouldn't be the primary way model fit is
assessed. Outofsample fit can be evaluated using the holdout_id
argument in
ModelSpec
and the predictive_accuracy
method of Analyzer
.
Data considerations
Amount of data needed
This section can help you build a sense of how much data you need. The guidance about the amount of data needed is rough and directional because the true answer depends on what the data is like.
 Data size is the number of geos times the number of time points.
 These time points and geos are not independent. For example, 1,000 data points in a marketing mixed modeling (MMM) setting isn't the same as something like 1,000 coin flips or 1,000 randomly assigned participants in an experiment.
Also see the sections for national models and geo models.
Amount of data needed for national models
An important confidence check metric for national models is the number of
data points per effect that you are trying to measure and
understand. For example, if you have 12 media channels, six controls, and
eight knots, the total is 26 effects. (For simplicity, ignore things like
Adstock and Hill parameters for this example.) If you have two year's worth
of weekly data, then you have 104 data points and four data points per
effect. This is a low samplesize scenario and you don't have enough data.
(Additionally, insufficient variation in the media spend adversely impacts
national models.) For more information about knots, see How the knots
argument works.
Because it is difficult to get enough data for a national model, you can do the following:
 Lower the scope of the MMM. You can estimate fewer media channels (either by dropping a channel with lowspend or combining channels), use fewer knots to estimate time effects, and remove any extraneous controls. However, don't remove important confounders.
 Get much more data. For example, use three years of weekly data instead of two. Adding more data will reduce the variance in inference, but might make the inference less relevant.
 Alternatively, consider adding geo granularity to your data and using a geo model instead of lowering the scope or adding more data.
Consider the previous hypothetical example for the national model. You can combine the 12 media channels into three, lower your knots to two. You might also recognize that one of your controls explains the KPI but not the media, which means that it is not a true confounder and you can remove it. If you also use three year's worth of weekly data, you then have 156 data points to estimate 10 effects. This is roughly 15 data points per effect and now you might be able to glean some directional information from the MMM.
Amount of data needed for geo models
The number of data points per effect that you are trying to measure and understand is still an important confidencecheck metric. However, due to the geo hierarchy, that metric is not as clear to interpret. For example, if you have 12 media channels, six controls, 100 knots, and 105 geos, that is roughly $(12 \times 105) + (6 \times 105) + 100 = 1,990$ effects to estimate. (You multiply by 105 for the number of geos because media and controls have geolevel effects.) If you have three year's worth of weekly data, then you have $105 \times (52 \times 3) = 16,380$ data points. This is roughly 8 data points per effect. For simplicity, ignore things like Adstock and Hill parameters in this example.
An important detail that was not considered in this example is that by
definition of a geo hierarchy, the geolevel media effects and geolevel control
effects are not independent across the geos. Effectively, this means that data
is shared when estimating the effect of media channel 1 on geo 1 and the effect
of media channel 1 on geo 2. This is similar for controls too. Because data is
shared, you effectively have more than eight data points per effect. How much
data is shared depends on how similar the effects are across geos. This can be
determined by the eta_m
and xi_c
parameters.
We recommend that if you are having difficulty getting enough data for a
geolevel model, then consider combining media channels or dropping a media
channel with low spend. Or, you can put a more regularizing prior on
hierarchical variance terms eta_m
and xi_c
, for example, HalfNormal(0.1)
.
The more regularizing hierarchical variance encourages sharing information
across geos.
Can I use campaignlevel data
The Meridian model is focused only at channellevel. We generally don't recommend running at the campaignlevel because MMM is a macro tool that works well at the channellevel. If you use distinct campaigns that have hard starts and stops, you risk losing the memory of the Adstock. If you are interested in more granular insights, we recommend datadriven multitouch attribution for your digital channels.
Geolevel model considerations
Geo selection
When you are selecting geos, consider the following guidance:

Drop the smallest geos by total KPI first. Smaller geos have less contribution to ROI, yet they can still have a high influence on model fit, particularly when there is a single residual variance for all groups (
unique_sigma_for_each_geo = False
ofModelSpec
).For US advertisers using designated market area (DMA) as the geographical unit, a rough guideline is to model the top 50100 DMAs by population size. This generally includes the vast majority of the KPI units, while excluding most of the noisier small DMAs that might impact model fit and convergence.

When each geo has its own residual variance
(
unique_sigma_for_each_geo = True
ofModelSpec
), noisier geos have less impact on model fit. However, this option can make convergence difficult for some datasets because it adds so much flexibility to the model. If MCMC sampling does converge under this option, it might be worth plotting the geo population size versus the mean residual standard deviation (sigma
parameter)  in most cases, you would expect to see a fairly monotone pattern. If you don't see this pattern, then it might be better to setunique_sigma_for_each_geo = False
and use a smaller subset of geos. 
If you want to make sure the model represents 100% of your KPI units, you can aggregate smaller geos into larger regions. However, this option comes with several caveats:
 Recognize that geolevel modeling is a big advantage and this advantage grows with the number of geographically separated treatment units. For more information, see Nationallevel versus geolevel modeling.
 Different geo aggregation grouping methods can lead to different MMM results.
 Media execution variables, such as impressions or cost, can usually be summed across geos. However, some control variables, such as temperature, can be less straightforward to aggregate.
Nationallevel media in a geolevel model
When most media are available at the geolevel, but one or two are only available at the national level, we recommend imputing the nationallevel media at a geolevel and running a geomodel. One naive imputation method is to approximate the geolevel media variable from its national level value, using the proportion of the population in the geo relative to the total population. Although it is preferable to have accurate geolevel data so that imputation isn't necessary, imputation can still yield useful information about the model parameters. For more information, see section 4.4 of Geolevel Bayesian Hierarchical Media Mix Modeling.
Why am I getting an error about controls that don't vary across groups or geos?
This warning means that you have a nationallevel variable that doesn't vary
across geos and you have set knots = n_times
. When knots = n_times
, each
time period is getting its own parameter. A nationallevel variable varies only
across time, and not across geo. Therefore, the nationallevel variable is
perfectly collinear with time and is redundant with a model that has a parameter
for each time period. Redundant means that you can keep either the
nationallevel variable or set knots < n_times
. Which variable you choose
depends on your interpretation goals.
Model settings
Set the max_lag
parameter
The Meridian model allows for media at time $t$ to affect the KPI at
times $t, t + 1, \dots , t + L$ where the integer $L$ is a
hyperparameter set by the user using max_lag
of ModelSpec
. Media can
potentially have a long effect that can go beyond max_lag
. However, the
lagged effect of media converges towards zero, due to the model assumption of
geometric decay.
In practice, max_lag
is used to truncate how long media can have an effect
because it has positive benefits including improved model convergence,
reasonable model runtimes, and maximizing data usage (reducing variance).
Keeping the max_lag
in the 210 range leads to a good balance of these
advantages and disadvantages.
Increasing max_lag
doesn't necessarily mean that ROI estimates will also
increase. One reason for this is because if the media at time $t$ can affect
the KPI at time $t+L$, this can take away from the effect of media at times
$t+1, \dots , t+L$ on the KPI at time $t+L$.
Set custom priors using past experiments
Meridian requires passing distributions for ROI calibration. Although setting custom priors using results from previous experiments is a sound approach, there are many nuances to consider before proceeding. For example:

The timing of the experiment in relation to the MMM time window: If the experiment was conducted either before or after the MMM time window, the results might not be directly applicable.

The duration of the experiment: An experiment of short duration might not effectively capture the longterm effects of the marketing effectiveness.

The complexity of the experiment: If the experiment involved a mixture of channels, the results might not provide clear insights into the performance of individual channels.

Estimand differences: The estimands used in experiments can differ from those used in the MMM. For example, the MMM counterfactual is zero spend, whereas some experiments might have a different counterfactual, such as reduced spend.

Population differences: The population targeted in the experiment might not be the same as the population considered in the MMM.
We recommend setting the custom priors based on your belief in the effectiveness of a channel. A prior belief can be informed by many things, including experiments or other reliable analyses. Use the strength in that prior belief to inform the standard deviation of the prior:

If you have a strong belief in the effectiveness of a channel, you can apply an adjustment factor to the standard deviation of the prior to reflect your confidence. For example, suppose you have conducted several experiments for a particular channel and all the experiments yielded similar ROI point estimates, or you have historical data from previous MMM analyses that support the effectiveness of this channel. In this case, you could set a smaller standard deviation for the prior so that the distribution won't vary widely. This tighter distribution indicates your strong confidence in the experimental results.

Conversely, the experiment might not necessarily translate to the MMM, considering some of the nuances listed earlier. In this case, you might choose to apply an adjustment factor to standard deviation of the prior distribution. For example, you could set a larger standard deviation for the prior, depending on your level of skepticism.
You should consider using the roi_calibration_period
argument in
ModelSpec
. For more information, see Set
the ROI calibration period.
When setting the prior, the LogNormal
distribution is a common
one to use. The following sample code can be used to transform experiment's
mean and standard error to the LogNormal
prior
distribution:
import numpy as np
def estimate_lognormal_dist(mean, std):
"""Reparameterization of lognormal distribution in terms of its mean and std."""
mu_log = np.log(mean)  0.5 * np.log((std/mean)**2 + 1)
std_log = np.sqrt(np.log((std/mean)**2 + 1))
return [mu_log, std_log]
However, if the results from previous experiments are near zero, you should
consider whether your prior beliefs are accurately represented by a
nonnegative distribution, such as the LogNormal
distribution. We
highly recommend plotting the prior distribution to confirm it matches
your prior intuitions before proceeding with the analysis. The following
sample code shows how to get reparameterized LogNormal
parameters, define the
distribution, and draw samples from it.
import tensorflow as tf
import tensorflow_probability as tfp
# Get reparameterized LogNormal distribution parameters
mu_log, std_log = estimate_lognormal_dist(mean, std)
mu_log = tf.convert_to_tensor(mu_log, dtype=tf.float32)
std_log = tf.convert_to_tensor(std_log, dtype=tf.float32)
# Define the LogNormal distribution
lognormal_dist = tfp.distributions.LogNormal(mu_log, std_log)
# Draw 10,000 samples
lognormal_samples = lognormal_dist.sample(10000).numpy()
Business considerations
Refresh your model
How to refresh your model and how often depends on the data frequency (such as daily or weekly) and in what timeframe the marketing team makes decisions. If decisions are quarterly, we recommend running the model each quarter. The data window can be expanded each time, so that the older data still has an influence on the most recent estimate.
Consider the following:
 Meridian doesn't model media effectiveness as timevarying. So, the decision to discard old data versus append new data is a biasvariance tradeoff. Appending the new data reduces variance because you have more data, and this can increase bias if media effectiveness and strategies have changed drastically over time.
 Appending a small amount of data can have a big effect on results because, in general, MMM estimates are highvariance.
 When you append new data, you can set the priors so that they match the posterior for results before the appended data. This action encourages the old results to match the new results, and there can be valid business reasons for doing this. We recommend that you set priors based on prior knowledge and intuition, and it is perfectly valid for this intuition to be informed by past MMM results. It is your decision as to how strongly you want past MMM results to inform your prior knowledge and intuition. However, consider that setting priors that match a previous MMM's results effectively counts the previous data twice.
Lead generating businesses with long sales cycles
For lead generating businesses with long sales cycles, best practices depend on your target variable, such as what outcome you want to measure. If generating a lead takes multiple months, then you can take more immediate action KPIs into account, such as number of conversions, number of site visits, or form entries.
Model Debugging
Getting MCMC convergence
Lack of convergence is typically due to one of the following causes:
 The model is poorly specified for the data. This problem can be in the likelihood (model specification) or in the prior.

The
n_adapt + n_burnin
orn_keep
arguments ofMeridian.sample_posterior
are not large enough.
To get your chains to converge, try the following recommendations in this order:
 Check for identifiability or weak identifiability using these questions:

Do you have highly multicollinear
media
orcontrols
variables? 
Is the variation in your
media
orcontrols
variable so small that it is difficult to estimate its effect? 
Is one of the
media
orcontrols
highly correlated with time or even perfectly collinear with time? For more information, see When you must use knots < n_times.  Reassess the priors, especially highly informative priors.

Adjust the modeling options. In particular, try decreasing the
knots
argument ofModelSpec
. Other modeling options to adjust includeunique_sigma_for_each_geo
ormedia_effects_dist
ofModelSpec
. 
Check for a data error, for example, whether the
population
order doesn't matchmedia
order for geos. Meridian's model assumes a geo hierarchy in media and control effects. If this assumption does not match your data, regularize these parameters further by setting the priors on parameters that measure hierarchical variance (eta_m
andxi_c
), for example,HalfNormal(0.1)
. You can also turn off the geo hierarchy assumption with aDeterministic(0)
prior.  Consider whether you have enough data. For more information, see Data considerations.
Baseline is too low or sometimes negative
The baseline is considered to be the model's estimate for the response variable if there was no media execution. If the baseline is too low, there are a few possible causes to investigate:
 The model doesn't have enough highquality controls, meaning controls that have an affect on both media execution and the response. Consider adding more meaningful controls or population scaling controls where it makes sense to do, such as query volume. Controls are not automatically populationscaled in Meridian.

The model is not sufficiently explaining time effects. Consider increasing
knots
.  If your ROI priors are informative, perhaps they are not good priors. Low baseline is equivalent to high ROIs. If you have low information in your data, you might have informative ROI priors and not be aware of it. For more information, see When the posterior is the same as the prior.
 Negative baseline often means that media is getting more incremental credit than is possible. Consider setting the ROI prior in terms of total media contribution, so that the total media contribution has low prior probability of being larger than 100%. For more information, see Set the total media contribution prior.
When the posterior is the same as the prior
When there are lots of variables that the model is trying to understand, you need more data to understand any particular variable. MMM typically tries to make inference on many variables without that many data points, particularly in the case of a national model. This means that there will be instances where there is little information in the data for a particular media channel. This situation can be exacerbated when a particular channel either has low spend, very low variance in the scaled media execution, or high correlation of scaled media execution between channels. For more information about data amounts, see Amount of data needed. For more information about channels with low spend, see Channels with low spend).
You can make the prior and the posterior different from each other by using increasingly uninformative priors. Recall that the prior represents an estimate of a parameter before the data has been taken into account and the posterior is meant to be an estimate of a parameter after the data has been taken into account. When there is little information in the data, the before and after data are going to be similar. This is particularly true when the prior is relatively informative. Relative refers to the information in the prior relative to the information in the data. This means that the data can always dominate the prior if you set an uninformative enough prior. However, if the prior is uninformative relative to the data, which also has low information in it, then the posterior will be quite wide, representing a lot of uncertainty.
One way to simplify things is to think about the prior you are setting for parameters such as ROI. You don't have to worry too much about the relative informativeness of the prior if you just set reasonable priors that you believe in. If there is little or no information in the data, then it makes sense from a Bayesian perspective that the prior and the posterior are similar. If there is a lot of information in the data, then your prior will likely move based on this data.
Channels with low spend
Channels with low spend are particularly susceptible to have an ROI posterior similar to the ROI prior. Each channel has a range of ROI values that fit the data reasonably well. If this range is wide and completely covers most of the prior probability mass, then the posterior tends to look like the prior. The range of reasonable ROI values for a small spend channel tends to be much wider than that of a high spend channel because small spend channels need very large ROI to have much influence on the model fit. It is more likely that a large range of ROI values will fit the data reasonably well.
Media effects are modeled based on the media metric provided, such as impressions and clicks. Neither the scale of the media metric nor the spend level has any effect on the model fit or the range of incremental KPI units that could reasonably be attributed to the channel. ROI is defined as incremental KPI units divided by spend, so when the range of reasonable incremental KPI units values is translated to an ROI range, a channel with larger spend will have a narrower range of ROI values that fit the data well.
Note: In the case of ordinary least squares regression, the scale of the covariates has no effect on the fit. The scale can matter in a Bayesian regression setting when priors are applied to the coefficients; however, Meridian applies a scaling transformation to each media metric. Scaling a channel's impressions by a factor of 100, for example, does not affect the Meridian model fit.
When ROI results are widely different depending on the prior used
ROI results can be very different depending on whether ROI default priors are used or beta default priors are used.
The use of ROI default priors and beta default priors can affect ROI results for the following reasons:
 When default ROI priors are used, each media channel's posterior ROI is regularized towards the same distribution. This is a good thing because every channel is treated equitably.
 When default priors on the media coefficients (beta) are used, each media channel's posterior ROI is regularized towards different distributions. This is because the scaling that is done on the media data is not the same scaling used across the channels. So the same beta value means different ROIs for different channels. The default priors on media coefficients are also uninformative relative to the default ROI prior to account for potentially big differences in scaling of the media data across channels.
 When there is little information in the data, the prior and the posterior will be similar, as discussed in When the posterior is the same as the prior. When there is little information in the data and beta priors are used, posterior ROIs will be different across the media channels. However, this difference is only coming from the inequitable priors on the media channels and not the data. In summary, it is important to not interpret different ROI results across the channels as a result that is picking up signal from the data, when the difference is only driven by inequitable priors.
ResourceExhaustedError when running Meridian.sample_posterior
.
Meridian.sample_posterior
calls tfp.experimental.mcmc.windowed_adaptive_nuts
, which can be memory intensive on GPUs when sampling a large number of chains in parallel or when training with large datasets.
One way to reduce the peak GPU memory consumption is to sample chains serially. This capability is provided by passing a list of integers to n_chains
. For example, n_chains=[5, 5]
will sample a total of 10 chains by calling tfp.experimental.mcmc.windowed_adaptive_nuts
consecutively, each time with the argument n_chains=5
.
Note that this does come with a runtime cost. Because using this method reduces memory consumption by using consecutive calls to our MCMC sampling method, the total runtime will increase linearly with the length of the list passed to n_chains
. For example, n_chains=[5,5]
can take up to 2 times as long to run as n_chains=10
, and n_chains=[4,3,3]
can take up to 3 times as long.