View source on GitHub
|
Runs calculations to analyze the raw data after fitting the model.
meridian.analysis.analyzer.Analyzer(
meridian: meridian.model.model.Meridian
)
Methods
adstock_decay
adstock_decay(
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL
) -> pd.DataFrame
Calculates adstock decay for paid media, RF, and organic media channels.
| Args | |
|---|---|
confidence_level
|
Confidence level for prior and posterior credible intervals, represented as a value between zero and one. |
| Returns | |
|---|---|
Pandas DataFrame containing the channel, time_units, distribution,
ci_hi, ci_lo, and mean for the Adstock function.
|
baseline_summary_metrics
baseline_summary_metrics(
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | None) = None,
aggregate_geos: bool = True,
aggregate_times: bool = True,
non_media_baseline_values: (Sequence[float] | None) = None,
use_kpi: bool = False,
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> xr.Dataset
Returns baseline summary metrics.
| Args | |
|---|---|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing a subset of times to include. By default, all time periods are included. |
aggregate_geos
|
Boolean. If True, the expected outcome is summed over
all of the regions.
|
aggregate_times
|
Boolean. If True, the expected outcome is summed over
all of the time periods.
|
non_media_baseline_values
|
Optional list of shape
(n_non_media_channels,). Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where
model_spec.non_media_population_scaling_id is True. If None, the
model_spec.non_media_baseline_values is used, which defaults to the
minimum value for each non_media treatment channel.
|
use_kpi
|
Boolean. If True, the baseline summary metrics are calculated
using KPI. If False, the metrics are calculated using revenue.
|
confidence_level
|
Confidence level for media summary metrics credible intervals, represented as a value between zero and one. |
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
An xr.Dataset with coordinates: metric (mean, median,
ci_low,ci_high),distribution (prior, posterior) and contains the
following data variables: baseline_outcome, pct_of_contribution.
|
compute_incremental_outcome_aggregate
compute_incremental_outcome_aggregate(
use_posterior: bool,
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
use_kpi: bool = False,
include_non_paid_channels: bool = True,
non_media_baseline_values: (Sequence[float] | None) = None,
**kwargs
) -> meridian.backend.Tensor
Aggregates the incremental outcome of the media channels.
| Args | |
|---|---|
use_posterior
|
Boolean. If True, then the incremental outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.
|
new_data
|
Optional DataTensors container with optional tensors: media,
reach, frequency, organic_media, organic_reach,
organic_frequency, non_media_treatments and revenue_per_kpi. If
None, the incremental outcome is calculated using the InputData
provided to the Meridian object. If new_data is provided, the
incremental outcome is calculated using the new tensors in new_data
and the original values of the remaining tensors. For example,
compute_incremental_outcome_aggregate(new_data=DataTensors(media=new_media))
computes the incremental outcome using new_media and the original
values of reach, frequency, organic_media, organic_reach,
organic_frequency, non_media_treatments and revenue_per_kpi. If
any of the tensors in new_data is provided with a different number of
time periods than in InputData, then all tensors must be provided with
the same number of time periods.
|
use_kpi
|
Boolean. If True, the summary metrics are calculated using KPI.
If False, the metrics are calculated using revenue.
|
include_non_paid_channels
|
Boolean. If True, then non-media treatments
and organic effects are included in the calculation. If False, then
only the paid media and RF effects are included.
|
non_media_baseline_values
|
Optional list of shape
(n_non_media_channels,). Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where
model_spec.non_media_population_scaling_id is True. If None, the
model_spec.non_media_baseline_values is used, which defaults to the
minimum value for each non_media treatment channel.
|
**kwargs
|
kwargs to pass to incremental_outcome, which could contain
selected_geos, selected_times, aggregate_geos, aggregate_times,
batch_size.
|
| Returns | |
|---|---|
A Tensor with the same dimensions as incremental_outcome except the size
of the channel dimension is incremented by one, with the new component at
the end containing the total incremental outcome of all channels.
|
cpik
cpik(
use_posterior: bool = True,
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> meridian.backend.Tensor
Calculates the cost per incremental KPI distribution for each channel.
The CPIK numerator is the total spend on the channel. The CPIK denominator is the change in expected KPI when one channel's spend is set to zero, leaving all other channels' spend unchanged.
If new_data=None, this method calculates CPIK conditional on the values of
the paid media variables that the Meridian object was initialized with. The
user can also override this historical data through the new_data argument.
For example,
new_data = DataTensors(media=new_media, frequency=new_frequency)
If selected_geos or selected_times is specified, then the CPIK
numerator is the total spend during the selected geos and time periods. An
exception will be thrown if the spend of the InputData used to train the
model does not have geo and time dimensions. (If the new_data.media_spend
and new_data.rf_spend arguments are used with different dimensions than
the InputData spend, then an exception will be thrown since this is a likely
user error.)
Note that CPIK is simply 1/ROI, where ROI is obtained from a call to the
roi method with use_kpi=True.
| Args | |
|---|---|
use_posterior
|
Boolean. If True then the posterior distribution is
calculated. Otherwise, the prior distribution is calculated.
|
new_data
|
Optional. DataTensors containing media, media_spend,
reach, frequency, rf_spend and revenue_per_kpi data. If
provided, the cpik is calculated using the values of the tensors passed
in new_data and the original values of all the remaining tensors. If
None, the ROI is calculated using the original values of all the
tensors. If any of the tensors in new_data is provided with a
different number of time periods than in InputData, then all tensors
must be provided with the same number of time periods.
|
selected_geos
|
Optional. Contains a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
the new_data args, if provided. By default, all time periods are
included.
|
aggregate_geos
|
Boolean. If True, the expected KPI is summed over all of
the regions.
|
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
Tensor of CPIK values with dimensions (n_chains, n_draws, n_geos,
(n_media_channels + n_rf_channels)). The n_geos dimension is dropped if
aggregate_geos=True.
|
expected_outcome
expected_outcome(
use_posterior: bool = True,
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | None) = None,
aggregate_geos: bool = True,
aggregate_times: bool = True,
inverse_transform_outcome: bool = True,
use_kpi: bool = False,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> meridian.backend.Tensor
Calculates either prior or posterior expected outcome.
This calculates E(Outcome|Media, RF, Organic media, Organic RF, Non-media
treatments, Controls) for each posterior (or prior) parameter draw, where
Outcome refers to either revenue if use_kpi=False, or kpi if
use_kpi=True. When revenue_per_kpi is not defined, use_kpi cannot
be False.
If new_data=None, this method calculates expected outcome conditional on
the values of the independent variables that the Meridian object was
initialized with. The user can also override this historical data through
the new_data argument, as long as the new tensors' dimensions match. For
example,
new_data=DataTensors(reach=new_reach, frequency=new_frequency)
In principle, expected outcome could be calculated with other time dimensions (for future predictions, for instance). However, this is not allowed with this method because of the additional complexities this introduces:
- Corresponding price (revenue per KPI) data would also be needed.
- If the model contains weekly effect parameters, then some method is needed to estimate or predict these effects for time periods outside of the training data window.
| Args | |
|---|---|
use_posterior
|
Boolean. If True, then the expected outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.
|
new_data
|
An optional DataTensors container with optional new tensors:
media, reach, frequency, organic_media, organic_reach,
organic_frequency, non_media_treatments, revenue_per_kpi,
controls. If None, expected outcome is calculated conditional on the
original values of the data tensors that the Meridian object was
initialized with. If new_data argument is used, expected outcome is
calculated conditional on the values of the tensors passed in new_data
and on the original values of the remaining unset tensors. For example,
expected_outcome(new_data=DataTensors(reach=new_reach,
frequency=new_frequency)) calculates expected outcome conditional on
the original media, organic_media, organic_reach,
organic_frequency, non_media_treatments, revenue_per_kpi, and
controls tensors and on the new given values for reach and
frequency tensors. The new tensors' dimensions must match the
dimensions of the corresponding original tensors from input_data.
|
selected_geos
|
Optional list of containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list of containing a subset of dates to include.
The values accepted here must match time dimension coordinates from
InputData.time. By default, all time periods are included.
|
aggregate_geos
|
Boolean. If True, the expected outcome is summed over
all regions.
|
aggregate_times
|
Boolean. If True, the expected outcome is summed over
all time periods.
|
inverse_transform_outcome
|
Boolean. If True, returns the expected
outcome in the original KPI or revenue (depending on what is passed to
use_kpi), as it was passed to InputData. If False, returns the
outcome after transformation by KpiTransformer, reflecting how its
represented within the model.
|
use_kpi
|
Boolean. If use_kpi = True, the expected KPI is calculated;
otherwise the expected revenue (kpi * revenue_per_kpi) is calculated.
It is required that use_kpi = True if revenue_per_kpi is not defined
or if inverse_transform_outcome = False.
|
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
Tensor of expected outcome (either KPI or revenue, depending on the
use_kpi argument) with dimensions (n_chains, n_draws, n_geos,
n_times). The n_geos and n_times dimensions is dropped if
aggregate_geos=True or aggregate_time=True, respectively.
|
| Raises | |
|---|---|
NotFittedModelError
|
if sample_posterior() (for use_posterior=True)
or sample_prior() (for use_posterior=False) has not been called
prior to calling this method.
|
expected_vs_actual_data
expected_vs_actual_data(
aggregate_geos: bool = False,
aggregate_times: bool = False,
use_kpi: bool = False,
split_by_holdout_id: bool = False,
non_media_baseline_values: (Sequence[float] | None) = None,
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL
) -> xr.Dataset
Calculates the data for the expected versus actual outcome over time.
| Args | |
|---|---|
aggregate_geos
|
Boolean. If True, the expected, baseline, and actual are
summed over all of the regions.
|
aggregate_times
|
Boolean. If True, the expected, baseline, and actual
are summed over all of the time periods.
|
use_kpi
|
If True, calculate the incremental KPI. Otherwise, calculate
the incremental revenue using the revenue per KPI (if available).
|
split_by_holdout_id
|
Boolean. If True and holdout_id exists, the data
is split into 'Train', 'Test', and 'All Data' subsections.
|
non_media_baseline_values
|
Optional list of shape
(n_non_media_channels,). Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where
model_spec.non_media_population_scaling_id is True. If None, the
model_spec.non_media_baseline_values is used, which defaults to the
minimum value for each non_media treatment channel.
|
confidence_level
|
Confidence level for expected outcome credible
intervals, represented as a value between zero and one. Default: 0.9.
|
| Returns | |
|---|---|
| A dataset with the expected, baseline, and actual outcome metrics. |
filter_and_aggregate_geos_and_times
filter_and_aggregate_geos_and_times(
tensor: meridian.backend.Tensor,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
aggregate_times: bool = True,
flexible_time_dim: bool = False,
has_media_dim: bool = True
) -> meridian.backend.Tensor
Filters and/or aggregates geo and time dimensions of a tensor.
| Args | |
|---|---|
tensor
|
Tensor with dimensions [..., n_geos, n_times] or [..., n_geos,
n_times, n_channels], where n_channels is the number of either media
channels, RF channels, all paid channels (media and RF), or all channels
(media, RF, non-media, organic media, organic RF).
|
selected_geos
|
Optional list containing a subset of geos to include. By
default, all geos are included. The selected geos should match those in
InputData.geo.
|
selected_times
|
Optional list of times to include. This can either be a
string list containing a subset of time dimension coordinates from
InputData.time or a boolean list with length equal to the time
dimension of the tensor. By default, all time periods are included.
|
aggregate_geos
|
Boolean. If True, the tensor is summed over all geos.
|
aggregate_times
|
Boolean. If True, the tensor is summed over all time
periods.
|
flexible_time_dim
|
Boolean. If True, the time dimension of the tensor is
not required to match the number of time periods in InputData.time. In
this case, if using selected_times, it must be a boolean list with
length equal to the time dimension of the tensor.
|
has_media_dim
|
Boolean. Only used if flexible_time_dim=True. Otherwise,
this is assumed based on the tensor dimensions. If True, the tensor is
assumed to have a media dimension following the time dimension. If
False, the last dimension of the tensor is assumed to be the time
dimension.
|
| Returns | |
|---|---|
| A tensor with filtered and/or aggregated geo and time dimensions. |
get_aggregated_impressions
get_aggregated_impressions(
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
aggregate_times: bool = True,
optimal_frequency: (Sequence[float] | None) = None,
include_non_paid_channels: bool = True
) -> meridian.backend.Tensor
Computes aggregated impressions values in the data across all channels.
| Args | |
|---|---|
new_data
|
An optional DataTensors object containing the new media,
reach, frequency, organic_media, organic_reach,
organic_frequency, and non_media_treatments tensors. If new_data
argument is used, then the aggregated impressions are computed using the
values of the tensors passed in the new_data argument and the original
values of all the remaining tensors. If None, the existing tensors
from the Meridian object are used.
|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
the tensors in the new_data argument, if provided. By default, all
time periods are included.
|
aggregate_geos
|
Boolean. If True, the expected outcome is summed over
all of the regions.
|
aggregate_times
|
Boolean. If True, the expected outcome is summed over
all of the time periods.
|
optimal_frequency
|
An optional list with dimension n_rf_channels,
containing the optimal frequency per channel, that maximizes posterior
mean ROI. Default value is None, and historical frequency is used for
the metrics calculation.
|
include_non_paid_channels
|
Boolean. If True, the organic media, organic
RF, and non-media channels are included in the aggregation.
|
| Returns | |
|---|---|
A tensor with the shape (n_selected_geos, n_selected_times, n_channels)
(or (n_channels,) if geos and times are aggregated) with aggregate
impression values per channel.
|
get_aggregated_spend
get_aggregated_spend(
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
include_media: bool = True,
include_rf: bool = True
) -> xr.DataArray
Gets the aggregated spend based on the selected geos and time.
| Args | |
|---|---|
new_data
|
An optional DataTensors object containing the new media,
media_spend, reach, frequency, rf_spend tensors. If None, the
existing tensors from the Meridian object are used. If new_data
argument is used, then the aggregated spend is computed using the values
of the tensors passed in the new_data argument and the original values
of all the remaining tensors. If any of the tensors in new_data is
provided with a different number of time periods than in InputData,
then all tensors must be provided with the same number of time periods.
|
selected_geos
|
Optional list containing a subset of geos to include. By
default, all geos are included. The selected geos should match those in
InputData.geo.
|
selected_times
|
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in KPI data. By default, all time periods are included. |
include_media
|
Whether to include spends for paid media channels that do not have R&F data. |
include_rf
|
Whether to include spends for paid media channels with R&F data. |
| Returns | |
|---|---|
An xr.DataArray with the coordinate channel and contains the data
variable spend.
|
| Raises | |
|---|---|
ValueError
|
A ValueError is raised when include_media and include_rf
are both False.
|
get_historical_spend
get_historical_spend(
selected_times: (Sequence[str] | None) = None,
include_media: bool = True,
include_rf: bool = True
) -> xr.DataArray
Deprecated. Gets the aggregated historical spend based on the time.
| Args | |
|---|---|
selected_times
|
The time period to get the historical spends. If None, the historical spends will be aggregated over all time points. |
include_media
|
Whether to include spends for paid media channels that do not have R&F data. |
include_rf
|
Whether to include spends for paid media channels with R&F data. |
| Returns | |
|---|---|
An xr.DataArray with the coordinate channel and contains the data
variable spend.
|
| Raises | |
|---|---|
ValueError
|
A ValueError is raised when include_media and include_rf
are both False.
|
get_rhat
get_rhat() -> Mapping[str, meridian.backend.Tensor]
Computes the R-hat values for each parameter in the model.
| Returns | |
|---|---|
| A dictionary of r-hat values where each parameter is a key and values are r-hats corresponding to the parameter. |
| Raises | |
|---|---|
NotFittedModelError
|
If self.sample_posterior() is not called before calling this method. |
hill_curves
hill_curves(
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL,
n_bins: int = 25
) -> pd.DataFrame
Estimates Hill curve tables used for plotting each channel's curves.
| Args | |
|---|---|
confidence_level
|
Confidence level for prior and posterior credible
intervals, represented as a value between zero and one. Default is
0.9.
|
n_bins
|
Number of equal-width bins to include in the histogram for the
plotting. Default is 25.
|
| Returns | |
|---|---|
Hill curves pd.DataFrame with columns:
|
incremental_outcome
incremental_outcome(
use_posterior: bool = True,
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
non_media_baseline_values: (Sequence[float] | None) = None,
scaling_factor0: float = 0.0,
scaling_factor1: float = 1.0,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
media_selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
aggregate_times: bool = True,
inverse_transform_outcome: bool = True,
use_kpi: bool = False,
by_reach: bool = True,
include_non_paid_channels: bool = True,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> meridian.backend.Tensor
Calculates either the posterior or prior incremental outcome.
This calculates the media outcome of each media channel for each posterior or prior parameter draw. Incremental outcome is defined as:
E(Outcome|Treatment_1, Controls) minus E(Outcome|Treatment_0, Controls)
For paid & organic channels (without reach and frequency data),
Treatment_1 means that media execution for a given channel is multiplied
by
scaling_factor1 (1.0 by default) for the set of time periods specified
by media_selected_times. Similarly, Treatment_0 means that media
execution is multiplied by scaling_factor0 (0.0 by default) for these time
periods.
For paid & organic channels with reach and frequency data, either reach or
frequency is held fixed while the other is scaled, depending on the
by_reach argument.
For non-media treatments, Treatment_1 means that the variable is set to
historical values. Treatment_0 means that the variable is set to its
baseline value for all geos and time periods. Note that the scaling factors
(scaling_factor0 and scaling_factor1) are not applicable to non-media
treatments.
"Outcome" refers to either revenue if use_kpi=False, or kpi if
use_kpi=True. When revenue_per_kpi is not defined, use_kpi cannot be
False.
If new_data=None, this method computes incremental outcome using media,
reach, frequency, organic_media, organic_reach, organic_frequency,
non_media_treatments and revenue_per_kpi tensors that the Meridian
object was initialized with. This behavior can be overridden with the
new_data argument. For example, new_data=DataTensors(media=new_media)
calculates incremental outcome using the new_media tensor and the original
values of reach, frequency, organic_media, organic_reach,
organic_frequency, non_media_treatments and revenue_per_kpi tensors.
The calculation in this method depends on two key assumptions made in the Meridian implementation:
- Additivity of media effects (no interactions).
- Additive changes on the model KPI scale correspond to additive changes on the original KPI scale. In other words, the intercept and control effects do not influence the media effects. This assumption currently holds because the outcome transformation only involves centering and scaling, for example, no log transformations.
| Args | |
|---|---|
use_posterior
|
Boolean. If True, then the incremental outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.
|
new_data
|
Optional DataTensors container with optional tensors: media,
reach, frequency, organic_media, organic_reach,
organic_frequency, non_media_treatments and revenue_per_kpi. If
None, the incremental outcome is calculated using the InputData
provided to the Meridian object. If new_data is provided, the
incremental outcome is calculated using the new tensors in new_data
and the original values of the remaining tensors. For example,
incremental_outcome(new_data=DataTensors(media=new_media) computes the
incremental outcome using new_media and the original values of
reach, frequency, organic_media, organic_reach,
organic_frequency, non_media_treatments and revenue_per_kpi. If
any of the tensors in new_data is provided with a different number of
time periods than in InputData, then all tensors must be provided with
the same number of time periods.
|
non_media_baseline_values
|
Optional list of shape
(n_non_media_channels,). Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where
model_spec.non_media_population_scaling_id is True. If None, the
model_spec.non_media_baseline_values is used, which defaults to the
minimum value for each non_media treatment channel.
|
scaling_factor0
|
Float. The factor by which to scale the counterfactual
scenario "Media_0" during the time periods specified in
media_selected_times. Must be non-negative and less than
scaling_factor1.
|
scaling_factor1
|
Float. The factor by which to scale "Media_1" during the
selected time periods specified in media_selected_times. Must be
non-negative and greater than scaling_factor0.
|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
new_data if time is modified in new_data, or input_data.n_times
otherwise. The incremental outcome corresponds to incremental KPI
generated during the selected_times arg by media executed during the
media_selected_times arg. Note that if use_kpi=False, then
selected_times can only include the time periods that have
revenue_per_kpi input data. By default, all time periods are included
where revenue_per_kpi data is available.
|
media_selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
KPI data or number of time periods in the new_data args, if provided.
If new_data is provided, media_selected_times can select any subset
of time periods in new_data. If new_data is not provided,
media_selected_times selects from InputData.time. The incremental
outcome corresponds to incremental KPI generated during the
selected_times arg by treatment variables executed during the
media_selected_times arg. For each channel, the incremental outcome is
defined as the difference between expected KPI when treatment variables
execution is scaled by scaling_factor1 and scaling_factor0 during
these specified time periods. By default, the difference is between
treatment variables at historical execution levels, or as provided in
new_data, versus zero execution. Defaults to include all time periods.
|
aggregate_geos
|
Boolean. If True, then incremental outcome is summed
over all regions.
|
aggregate_times
|
Boolean. If True, then incremental outcome is summed
over all time periods.
|
inverse_transform_outcome
|
Boolean. If True, returns the expected
outcome in the original KPI or revenue (depending on what is passed to
use_kpi), as it was passed to InputData. If False, returns the
outcome after transformation by KpiTransformer, reflecting how its
represented within the model.
|
use_kpi
|
Boolean. If use_kpi = True, the expected KPI is calculated;
otherwise the expected revenue (kpi * revenue_per_kpi) is calculated.
It is required that use_kpi = True if revenue_per_kpi data is not
available or if inverse_transform_outcome = False.
|
by_reach
|
Boolean. If True, then the incremental outcome is calculated
by scaling the reach and holding the frequency constant. If False,
then the incremental outcome is calculated by scaling the frequency and
holding the reach constant. Only used for channels with RF data.
|
include_non_paid_channels
|
Boolean. If True, then non-media treatments
and organic effects are included in the calculation. If False, then
only the paid media and RF effects are included.
|
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
Tensor of incremental outcome (either KPI or revenue, depending on
use_kpi argument) with dimensions (n_chains, n_draws, n_geos,
n_times, n_channels). If include_non_paid_channels=True, then
n_channel is the total number of media, RF, organic media, and organic
RF and non-media channels. If include_non_paid_channels=False, then
n_channels is the total number of media and RF channels. The n_geos
and n_times dimensions are dropped if aggregate_geos=True or
aggregate_times=True, respectively.
|
| Raises | |
|---|---|
NotFittedModelError
|
If sample_posterior() (for use_posterior=True)
or sample_prior() (for use_posterior=False) has not been called
prior to calling this method.
|
ValueError
|
If new_data argument contains tensors with modified time
dimension and not all treatment variables are provided in new_data
with matching time dimensions.
|
marginal_roi
marginal_roi(
incremental_increase: float = 0.01,
use_posterior: bool = True,
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
by_reach: bool = True,
use_kpi: bool = False,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> meridian.backend.Tensor
Calculates the marginal ROI prior or posterior distribution.
The marginal ROI (mROI) numerator is the change in expected outcome (kpi
or kpi * revenue_per_kpi) when one channel's spend is increased by a small
fraction. The mROI denominator is the corresponding small fraction of the
channel's total spend.
If new_data=None, this method calculates marginal ROI conditional on the
values of the paid media variables that the Meridian object was initialized
with. The user can also override this historical data through the new_data
argument. For example,
new_data = DataTensors(media=new_media, frequency=new_frequency)
If selected_geos or selected_times is specified, then the mROI
denominator is based on the total spend during the selected geos and time
periods. An exception will be thrown if the spend of the InputData used to
train the model does not have geo and time dimensions. (If the
new_data.media_spend and new_data.rf_spend arguments are used with
different dimensions than the InputData spend, then an exception will be
thrown since this is a likely user error.)
| Args | |
|---|---|
incremental_increase
|
Small fraction by which each channel's spend is
increased when calculating its mROI numerator. The mROI denominator is
this fraction of the channel's total spend. Only used if marginal is
True.
|
use_posterior
|
If True then the posterior distribution is calculated.
Otherwise, the prior distribution is calculated.
|
new_data
|
Optional. DataTensors containing media, media_spend,
reach, frequency, rf_spend and revenue_per_kpi data. If
provided, the marginal ROI is calculated using the values of the tensors
passed in new_data and the original values of all the remaining
tensors. If None, the marginal ROI is calculated using the original
values of all the tensors. If any of the tensors in new_data is
provided with a different number of time periods than in InputData,
then all tensors must be provided with the same number of time periods.
|
selected_geos
|
Optional. Contains a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
the new_data args, if provided. By default, all time periods are
included.
|
aggregate_geos
|
If True, the expected revenue is summed over all of the
regions.
|
by_reach
|
Used for a channel with reach and frequency. If True, returns
the mROI by reach for a given fixed frequency. If False, returns the
mROI by frequency for a given fixed reach.
|
use_kpi
|
If False, then revenue is used to calculate the mROI numerator.
Otherwise, uses KPI to calculate the mROI numerator.
|
batch_size
|
Maximum draws per chain in each batch. The calculation is run
in batches to avoid memory exhaustion. If a memory error occurs, try
reducing batch_size. The calculation will generally be faster with
larger batch_size values.
|
| Returns | |
|---|---|
Tensor of mROI values with dimensions (n_chains, n_draws, n_geos,
(n_media_channels + n_rf_channels)). The n_geos dimension is dropped if
aggregate_geos=True.
|
negative_baseline_probability
negative_baseline_probability(
non_media_baseline_values: (Sequence[float] | None) = None,
use_posterior: bool = True,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | None) = None,
use_kpi: bool = False,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> np.floating
Calculates either prior or posterior negative baseline probability.
This calculates either the prior or posterior probability that the baseline, aggregated over the supplied time window, is negative.
The baseline is calculated by computing expected_outcome with the
following assumptions:
1) media is set to all zeros,
2) reach is set to all zeros,
3) organic_media is set to all zeros,
4) organic_reach is set to all zeros,
5) non_media_treatments is set to the counterfactual values according
to the non_media_baseline_values argument,
6) controls are set to historical values.
| Args | |
|---|---|
non_media_baseline_values
|
Optional list of shape
(n_non_media_channels,). Each element is a float denoting a fixed
value that will be used as the baseline for the given channel. It is
expected that they are scaled by population for the channels where
model_spec.non_media_population_scaling_id is True. If None, the
model_spec.non_media_baseline_values is used, which defaults to the
minimum value for each non_media treatment channel.
|
use_posterior
|
Boolean. If True, then the expected outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.
|
selected_geos
|
Optional list of containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list of containing a subset of dates to include.
The values accepted here must match time dimension coordinates from
InputData.time. By default, all time periods are included.
|
use_kpi
|
Boolean. If use_kpi = True, the expected KPI is calculated;
otherwise the expected revenue (kpi * revenue_per_kpi) is calculated.
It is required that use_kpi = True if revenue_per_kpi is not defined
or if inverse_transform_outcome = False.
|
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
| A float representing the prior or posterior negative baseline probability over the supplied time window. |
| Raises | |
|---|---|
NotFittedModelError
|
if sample_posterior() (for use_posterior=True)
or sample_prior() (for use_posterior=False) has not been called
prior to calling this method.
|
optimal_freq
optimal_freq(
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
max_frequency: (float | None) = None,
freq_grid: (Sequence[float] | None) = None,
use_posterior: bool = True,
use_kpi: bool = False,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL
) -> xr.Dataset
Calculates the optimal frequency that maximizes posterior mean ROI.
For this optimization, historical spend is used and fixed, and frequency is restricted to be constant across all geographic regions and time periods. Reach is calculated for each geographic area and time period such that the number of impressions remains unchanged as frequency varies. Meridian solves for the frequency at which posterior mean ROI is optimized.
If new_data=None, this method calculates the opptimal frequency on the
values of the paid RF variables that the Meridian object was initialized
with. The user can override this historical data through the new_data
argument. For example,
new_data = DataTensors(reach=new_reach, frequency=new_frequency)
| Args | |
|---|---|
new_data
|
Optional DataTensors object containing rf_impressions,
rf_spend, and revenue_per_kpi. If provided, the optimal frequency is
calculated using the values of the tensors passed in new_data and the
original values of all the remaining tensors. If None, the historical
data used to initialize the Meridian object is used. If any of the
tensors in new_data is provided with a different number of time
periods than in InputData, then all tensors must be provided with the
same number of time periods.
|
max_frequency
|
Maximum frequency value used to calculate the frequency
grid. If None, the maximum frequency value is calculated from the
historic frequency (maximum value of Meridian.input_data, not
new_data). If freq_grid is provided, this argument has no effect.
|
freq_grid
|
List of frequency values. The ROI of each channel is calculated
for each frequency value in the list. By default, the list includes
numbers from 1.0 to the maximum frequency in increments of 0.1.
|
use_posterior
|
Boolean. If True, posterior optimal frequencies are
generated. If False, prior optimal frequencies are generated.
|
use_kpi
|
Boolean. If True, the counterfactual metrics are calculated
using KPI. If False, the counterfactual metrics are calculated using
revenue.
|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
new_data if time is modified in new_data, or input_data.n_times
otherwise. By default, all time periods are included.
|
confidence_level
|
Confidence level for prior and posterior credible intervals, represented as a value between zero and one. |
| Returns | |
|---|---|
An xarray Dataset which contains:
|
| Raises | |
|---|---|
NotFittedModelError
|
If sample_posterior() (for use_posterior=True)
or sample_prior() (for use_posterior=False) has not been called
prior to calling this method.
|
ValueError
|
If there are no channels with reach and frequency data. |
predictive_accuracy
predictive_accuracy(
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | None) = None,
use_kpi: bool = False,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> xr.Dataset
Calculates R-Squared, MAPE, and wMAPE goodness of fit metrics.
R-Squared, MAPE (mean absolute percentage error), and wMAPE (weighted
absolute percentage error) are calculated on the revenue scale
(KPI * revenue_per_kpi) when revenue_per_kpi is specified, or the KPI
scale when revenue_per_kpi = None. This is the same scale as what is used
in the ROI numerator (incremental outcome).
Prediction errors in wMAPE are weighted by the actual revenue
(KPI * revenue_per_kpi) when revenue_per_kpi is specified, or weighted
by the KPI scale when revenue_per_kpi = None. This means that percentage
errors when revenue is high are weighted more heavily than errors when
revenue is low.
R-Squared, MAPE and wMAPE are calculated both at the model-level (one
observation per geo and time period) and at the national-level (aggregating
KPI or revenue outcome across geos so there is one observation per time
period).
R-Squared, MAPE, and wMAPE are calculated for the full sample. If the
model object has any holdout observations, then R-squared, MAPE, and
wMAPE are also calculated for the Train and Test subsets.
| Args | |
|---|---|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing a subset of dates to include. By default, all time periods are included. |
use_kpi
|
Whether to use KPI or revenue scale for the predictive accuracy metrics. |
batch_size
|
Integer representing the maximum draws per chain in each
batch. By default, batch_size is 100. The calculation is run in
batches to avoid memory exhaustion. If a memory error occurs, try
reducing batch_size. The calculation will generally be faster with
larger batch_size values.
|
| Returns | |
|---|---|
An xarray Dataset containing the computed R_Squared, MAPE, and wMAPE
values, with coordinates metric, geo_granularity, evaluation_set,
and accompanying data variable value. If holdout_id exists, the data
is split into 'Train', 'Test', and 'All Data' subsections, and the
three metrics are computed for each.
|
response_curves
response_curves(
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
spend_multipliers: (list[float] | None) = None,
use_posterior: bool = True,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | None) = None,
by_reach: bool = True,
use_optimal_frequency: bool = False,
use_kpi: bool = False,
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> xr.Dataset
Method to generate a response curves xarray.Dataset.
Response curves are calculated in aggregate across geos and time periods, assuming the historical flighting pattern across geos and time periods for each media channel.
A list of multipliers is applied to each media channel's total historical
spend within selected_geos and selected_times to obtain the x-axis
values. The y-axis values are the incremental outcome generated by each
channel within selected_geos and selected_times under the counterfactual
where media units in each geo and time period are scaled by the
corresponding multiplier. (Media units for time periods prior to
selected_times are also scaled by the multiplier.)
| Args | |
|---|---|
new_data
|
Optional DataTensors object with optional new tensors:
media, reach, frequency, media_spend, rf_spend,
revenue_per_kpi, times. If provided, the response curves are
calculated using the values of the tensors passed in new_data and the
original values of all the remaining tensors. If None, the response
curves are calculated using the original values of all the tensors. If
any of the tensors in new_data is provided with a different number of
time periods than in InputData, then all tensors must be provided with
the same number of time periods and the time tensor must be provided.
|
spend_multipliers
|
List of multipliers. Each channel's total spend is multiplied by these factors to obtain the values at which the curve is calculated for that channel. |
use_posterior
|
Boolean. If True, posterior response curves are
generated. If False, prior response curves are generated.
|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing a subset of dates to include. If
new_data is provided with modified time periods, then selected_times
must be a subset of new_data.times. Otherwise, selected_times must
be a subset of self._meridian.input_data.time. By default, all time
periods are included.
|
by_reach
|
Boolean. For channels with reach and frequency. If True, plots
the response curve by reach. If False, plots the response curve by
frequency.
|
use_optimal_frequency
|
If True, uses the optimal frequency to plot the
response curves. Defaults to False.
|
use_kpi
|
A boolean flag indicating whether to use KPI instead of revenue
to generate the response curves. Defaults to False.
|
confidence_level
|
Confidence level for prior and posterior credible intervals, represented as a value between zero and one. |
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
An xarray.Dataset containing the data needed to visualize response
curves.
|
rhat_summary
rhat_summary(
bad_rhat_threshold: float = 1.2
) -> pd.DataFrame
Computes a summary of the R-hat values for each parameter in the model.
Summarizes the Gelman & Rubin (1992) potential scale reduction for chain convergence, commonly referred to as R-hat. It is a convergence diagnostic measure that measures the degree to which variance (of the means) between chains exceeds what you would expect if the chains were identically distributed. Values close to 1.0 indicate convergence. R-hat < 1.2 indicates approximate convergence and is a reasonable threshold for many problems (Brooks & Gelman, 1998).
| References | |
|---|---|
| Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4):457-472, 1992. Stephen P. Brooks and Andrew Gelman. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics, 7(4), 1998. |
| Args | |
|---|---|
bad_rhat_threshold
|
The threshold for determining which R-hat values are considered bad. |
| Returns | |
|---|---|
A DataFrame with the following columns:
|
| Raises | |
|---|---|
NotFittedModelError
|
If self.sample_posterior() is not called before
calling this method.
|
ValueError
|
If the number of dimensions of the R-hat array for a parameter
is not 1 or 2.
|
roi
roi(
use_posterior: bool = True,
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
use_kpi: bool = False,
batch_size: int = constants.DEFAULT_BATCH_SIZE
) -> meridian.backend.Tensor
Calculates ROI prior or posterior distribution for each media channel.
The ROI numerator is the change in expected outcome (kpi or kpi *
revenue_per_kpi) when one channel's spend is set to zero, leaving all other
channels' spend unchanged. The ROI denominator is the total spend of the
channel.
If new_data=None, this method calculates ROI conditional on the values of
the paid media variables that the Meridian object was initialized with. The
user can also override this historical data through the new_data argument.
For example,
new_data = DataTensors(media=new_media, frequency=new_frequency)
If selected_geos or selected_times is specified, then the ROI
denominator is the total spend during the selected geos and time periods. An
exception will be thrown if the spend of the InputData used to train the
model does not have geo and time dimensions. (If the new_data.media_spend
and new_data.rf_spend arguments are used with different dimensions than
the InputData spend, then an exception will be thrown since this is a likely
user error.)
| Args | |
|---|---|
use_posterior
|
Boolean. If True, then the posterior distribution is
calculated. Otherwise, the prior distribution is calculated.
|
new_data
|
Optional. DataTensors containing media, media_spend,
reach, frequency, and rf_spend, and revenue_per_kpi data. If
provided, the ROI is calculated using the values of the tensors passed
in new_data and the original values of all the remaining tensors. If
None, the ROI is calculated using the original values of all the
tensors. If any of the tensors in new_data is provided with a
different number of time periods than in InputData, then all tensors
must be provided with the same number of time periods.
|
selected_geos
|
Optional. Contains a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
the new_data args, if provided. By default, all time periods are
included.
|
aggregate_geos
|
Boolean. If True, the expected revenue is summed over
all of the regions.
|
use_kpi
|
If False, then revenue is used to calculate the ROI numerator.
Otherwise, uses KPI to calculate the ROI numerator.
|
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
| Returns | |
|---|---|
Tensor of ROI values with dimensions (n_chains, n_draws, n_geos,
(n_media_channels + n_rf_channels)). The n_geos dimension is dropped if
aggregate_geos=True.
|
summary_metrics
summary_metrics(
new_data: (meridian.analysis.analyzer.DataTensors | None) = None,
marginal_roi_by_reach: bool = True,
marginal_roi_incremental_increase: float = 0.01,
selected_geos: (Sequence[str] | None) = None,
selected_times: (Sequence[str] | Sequence[bool] | None) = None,
aggregate_geos: bool = True,
aggregate_times: bool = True,
optimal_frequency: (Sequence[float] | None) = None,
use_kpi: bool = False,
confidence_level: float = constants.DEFAULT_CONFIDENCE_LEVEL,
batch_size: int = constants.DEFAULT_BATCH_SIZE,
include_non_paid_channels: bool = False,
non_media_baseline_values: (Sequence[float] | None) = None
) -> xr.Dataset
Returns summary metrics.
If new_data=None, this method calculates all the metrics conditional on
the values of the data variables that the Meridian object was initialized
with. The user can also override this historical data through the new_data
argument. For example, to override the media, frequency, and non-media
treatments data variables, the user can pass the following new_data
argument:
new_data = DataTensors(
media=new_media,
frequency=new_frequency,
non_media_treatments=new_non_media_treatments)
Note that if new_data is provided with a different number of time periods
than in InputData, pct_of_contribution is not defined because
expected_outcome() is not defined for new time periods.
Note that mroi and effectiveness metrics are not defined (math.nan)
for the aggregate "All Paid Channels" channel dimension.
| Args | |
|---|---|
new_data
|
Optional DataTensors object with optional new tensors:
media, media_spend, reach, frequency, rf_spend,
organic_media, organic_reach, organic_frequency,
non_media_treatments, controls, revenue_per_kpi. If provided, the
summary metrics are calculated using the values of the tensors passed in
new_data and the original values of all the remaining tensors. If
None, the summary metrics are calculated using the original values of
all the tensors. If new_data is provided with a different number of
time periods than in InputData, then all tensors, except controls,
must have the same number of time periods.
|
marginal_roi_by_reach
|
Boolean. Marginal ROI (mROI) is defined as the
return on the next dollar spent. If this argument is True, the
assumption is that the next dollar spent only impacts reach, holding
frequency constant. If this argument is False, the assumption is that
the next dollar spent only impacts frequency, holding reach constant.
Used only when include_non_paid_channels is False.
|
marginal_roi_incremental_increase
|
Small fraction by which each channel's
spend is increased when calculating its mROI numerator. The mROI
denominator is this fraction of the channel's total spend. Used only
when include_non_paid_channels is False.
|
selected_geos
|
Optional list containing a subset of geos to include. By default, all geos are included. |
selected_times
|
Optional list containing either a subset of dates to
include or booleans with length equal to the number of time periods in
the tensors in the new_data argument, if provided. By default, all
time periods are included.
|
aggregate_geos
|
Boolean. If True, the expected outcome is summed over
all of the regions.
|
aggregate_times
|
Boolean. If True, the expected outcome is summed over
all of the time periods. Note that if False, ROI, mROI, Effectiveness,
and CPIK are not reported because they do not have a clear
interpretation by time period.
|
optimal_frequency
|
An optional list with dimension n_rf_channels,
containing the optimal frequency per channel, that maximizes posterior
mean ROI. Default value is None, and historical frequency is used for
the metrics calculation.
|
use_kpi
|
Boolean. If True, the summary metrics are calculated using KPI.
If False, the metrics are calculated using revenue.
|
confidence_level
|
Confidence level for summary metrics credible intervals, represented as a value between zero and one. |
batch_size
|
Integer representing the maximum draws per chain in each
batch. The calculation is run in batches to avoid memory exhaustion. If
a memory error occurs, try reducing batch_size. The calculation will
generally be faster with larger batch_size values.
|
include_non_paid_channels
|
Boolean. If True, non-paid channels (organic
media, organic reach and frequency, and non-media treatments) are
included in the summary but only the metrics independent of spend are
reported. If False, only the paid channels (media, reach and
frequency) are included but the summary contains also the metrics
dependent on spend. Default: False.
|
non_media_baseline_values
|
Optional list of shape
(n_non_media_channels,). Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where
model_spec.non_media_population_scaling_id is True. If None, the
model_spec.non_media_baseline_values is used, which defaults to the
minimum value for each non_media treatment channel.
|
| Returns | |
|---|---|
An xr.Dataset with coordinates: channel, metric (mean, median,
ci_low, ci_high), distribution (prior, posterior) and contains the
following non-paid data variables: incremental_outcome,
pct_of_contribution, effectiveness, and the following paid
data variables: impressions, pct_of_impressions, spend,
pct_of_spend, CPM, roi, mroi, cpik. The paid data variables are
only included when include_non_paid_channels is False. Note that
roi, mroi, cpik, and effectiveness metrics are not reported
when aggregate_times=False because they do not have a clear
interpretation by time period.
|
View source on GitHub