Assess the baseline

The baseline is the expected outcome in the counterfactual scenario where all treatment variables are set to their baseline values. For paid and organic media, the baseline values are zero. For non-media treatment variables, the baseline value can be set to the observed minimum value of the variable (default), the maximum, or a user-provided float. Estimating the baseline allows one to understand what would have happened if they did not engage in paid media, organic media, or other non-media treatments. It is necessary for the causal inference of the treatments, and thus, it is important to assess the baseline.

Assess the baseline for negativity

The outcome (either revenue or the KPI, see Glossary) cannot be negative. Therefore, a negative baseline indicates statistical error in the causal inference of the treatment effects. Like all statistical models, we expect some statistical error in Meridian. However, an extremely negative baseline indicates extreme error.

A model result with an extremely negative baseline reveals that the model needs adjustment. This provides a clear signal that model settings, data used to fit the model, or priors ought to be adjusted (see Mitigate negative or low baseline). This compels a more thorough and iterative model development process, ultimately leading to a more accurate, reliable, and insightful model.

Meridian is a statistical and probabilistic model. We recommend taking advantage of this by assessing the negativity of the baseline probabilistically. Specifically, consider the posterior probability that the baseline aggregated over the entire time window is negative. If this probability is too high, then the model result may have large statistical error. You can calculate this probability with the following:

from meridian.analysis import analyzer
from meridian.model import model
import numpy as np

mmm = model.Meridian(...)
mmm.sample_posterior(...)
a = analyzer.Analyzer(mmm)
posterior_baseline_draws = a._calculate_baseline_expected_outcome()
posterior_negative_baseline_prob = np.mean(posterior_baseline_draws < 0)

Alternatively, one could examine the point estimate of the baseline aggregated over the entire time window, as in Channel contribution charts. However, remember that Meridian is a statistical and probabilistic model, and a point estimate can have significant uncertainty. A negative baseline point estimate does not necessarily indicate model bias. It indicates error, which can stem from either bias or variance (see Bias–variance decomposition of mean squared error). This is especially true when the data contains limited information (see Amount of data needed).

We advise against overemphasizing a baseline time series, such as the one in Model fit charts, that occasionally dips into negative values. An occasional, especially small, dip indicates minor error, which is inherent in any statistical model.

Negative baseline versus contribution percentage

There is a subtle difference between "total contribution exceeding 100% of observed outcome" (which occurs if incremental outcome exceeds observed outcome) and "negative baseline" (which occurs if incremental outcome exceeds expected outcome). The posterior distribution of total expected outcome is usually very closely distributed around the total observed outcome, but the prior distribution of expected outcome typically has large variance and is not centered around the observed outcome. Consequently, it does not make much sense to evaluate the prior probability of negative baseline, which is why we advise evaluating the probability that contribution exceeds 100% of observed outcome instead.

Also note that the total expected outcome may differ from the total observed outcome if your model uses revenue_per_kpi values with high variance. Typically the total expected KPI will be very close to the total observed KPI, but high variance in revenue_per_kpi can create a discrepancy between total expected outcome and total observed outcome.

Mitigate negative or low baseline

As a first step, calculate the prior probability that the total treatment contribution exceeds 100% of total observed outcome:

from meridian.model import model
from meridian.analyzer import analyzer
import numpy as np

mmm = model.Meridian(...)
mmm.sample_prior(1000)
a = analyzer.Analyzer(mmm)
outcome = mmm.kpi
if mmm.revenue_per_kpi is not None:
  outcome *= mmm.revenue_per_kpi
total_outcome = np.sum(outcome)
prior_contribution = a.incremental_outcome(use_posterior=False) / total_outcome
total_prior_contribution = np.sum(prior_contribution, -1)
np.mean(total_prior_contribution > 1, (0, 1))

As a next step, calculate the prior probability that each individual treatment's contribution exceeds 100%:

np.mean(prior_contribution > 1, (0, 1))

If these prior probabilities are too high, particularly when the data contains limited information (see When the posterior is the same as the prior), a negative baseline problem may arise. Consider these adjustments to the prior and then reevaluate the custom priors using the checks described earlier:

Negative baseline is equivalent to the treatment effects getting too much credit. To mitigate this, set custom priors for the treatment effects (see How to choose treatment prior types) that reduce the prior probability that the total treatment contribution exceeds 100% of total observed outcome. In particular, a custom contribution prior type may be appropriate.
A channel with both high spend and high ROI might suggest that the channel drives more than 100% of the outcome, resulting in a negative baseline. For example, if your outcome is revenue and a channel's spend is 20% of total revenue, an ROI of 5.0 would indicate that the channel drives 100% of revenue (i.e., the channel's contribution is 100%). The actual ROI is likely much lower than 5.0, and setting a prior ROI to reflect this can help prevent a negative baseline. As a rule of thumb, the 90th percentile ROI prior for a channel shouldn't imply that the channel's contribution is over 100%.
If paid media is getting too much credit, consider setting the ROI prior in terms of total media contribution, so that the total media contribution has low prior probability of being larger than 100%. For more information, see Set the total media contribution prior.

In addition to adjusting the prior, there are a few other possible causes to investigate:

The model doesn't have enough high-quality controls, meaning controls that have an affect on both media execution and the response. Consider adding more meaningful controls or population scaling controls where it makes sense to do, such as query volume. Meridian does not population scale control variables by default. To population scale control variables, use control_population_scaling_id in ModelSpec.
The model is not sufficiently explaining time effects. Consider increasing knots or selecting more appropriate knot locations.