A data container for advertising data in a format supported by Meridian.
meridian.data.input_data.InputData(
kpi: xr.DataArray,
kpi_type: str,
population: xr.DataArray,
controls: (xr.DataArray | None) = None,
revenue_per_kpi: (xr.DataArray | None) = None,
media: (xr.DataArray | None) = None,
media_spend: (xr.DataArray | None) = None,
reach: (xr.DataArray | None) = None,
frequency: (xr.DataArray | None) = None,
rf_spend: (xr.DataArray | None) = None,
organic_media: (xr.DataArray | None) = None,
organic_reach: (xr.DataArray | None) = None,
organic_frequency: (xr.DataArray | None) = None,
non_media_treatments: (xr.DataArray | None) = None
)
Attributes |
kpi
|
A DataArray of dimensions (n_geos, n_times) containing the
non-negative dependent variable. Typically this is the number of units
sold, but it can be any metric, such as revenue or conversions.
|
kpi_type
|
A string denoting whether the KPI is of a 'revenue' or
'non-revenue' type. When the kpi_type is 'non-revenue' and
revenue_per_kpi exists, ROI calibration is used and the analysis is run
on revenue. When the revenue_per_kpi doesn't exist for the same
kpi_type , custom ROI calibration is used and the analysis is run on KPI.
|
population
|
A DataArray of dimensions (n_geos,) containing the population
of each group. This variable is used to scale the KPI and media for
modeling.
|
controls
|
An optional DataArray of dimensions (n_geos, n_times,
n_controls) containing control variable values.
|
revenue_per_kpi
|
An optional DataArray of dimensions (n_geos, n_times)
containing the average revenue amount per KPI unit. Although modeling is
done on kpi , model analysis and optimization are done on KPI *
revenue_per_kpi (revenue), if this value is available. If kpi
corresponds to revenue, then an array of ones is passed automatically.
|
media
|
An optional DataArray of dimensions (n_geos, n_media_times,
n_media_channels) containing non-negative media execution values.
Typically these are impressions, but it can be any metric, such as cost or
clicks. n_media_times ≥ n_times is required, and the final n_times
time periods must align with the time window of kpi and controls . Due
to lagged effects, we recommend that the time window for media includes up
to max_lag additional periods prior to this window. If n_media_times <
n_times + max_lag , the model effectively imputes media history as zero
(no media execution). If n_media_times > n_times + max_lag , then
only the final n_times + max_lag periods are used to fit the model.
media and media_spend must contain the same number of media channels
in the same order. If either of these arguments is passed, then the other
is not optional.
|
media_spend
|
An optional DataArray containing the cost of each media
channel. This is used as the denominator for ROI calculations. It is also
used to calculate an assumed cost per media unit for post-modeling
analysis such as response curves and budget optimization. Only the
aggregate spend (across geos and time periods) is required for these
calculations. However, a spend breakdown by geo and time period is
required if roi_calibration_period is specified or if conducting
post-modeling analysis on a specific subset of geos and/or time periods.
The DataArray shape can be (n_geos, n_times, n_media_channels) or
(n_media_channels,) if the data is aggregated over geo and time
dimensions. We recommend that the spend total aligns with the time window
of the kpi and controls data, which is the time window over which
incremental outcome of the ROI numerator is calculated. However, note that
incremental outcome is influenced by media execution prior to this time
window, through lagged effects, and excludes lagged effects beyond the
time window of media executed during the time window. media and
media_spend must contain the same number of media channels in the same
order. If either of these arguments is passed, then the other is not
optional. If a tensor of shape (n_media_channels,) is passed as
media_spend , then it will be automatically allocated across geos and
times proportinally to media .
|
reach
|
An optional DataArray of dimensions (n_geos, n_media_times,
n_rf_channels) containing non-negative reach values. It is required
that n_media_times ≥ n_times , and the final n_times time periods
must align with the time window of kpi and controls . The time window
must include the time window of the kpi and controls data, but it is
optional to include lagged time periods prior to the time window of the
kpi and controls data. If lagged reach is not included, or if the
lagged reach includes fewer than max_lag time periods, then the model
calculates Adstock assuming that reach execution is zero prior to the
first observed time period. We recommend including n_times + max_lag
time periods, unless the value of max_lag is prohibitively large. If
only media data is used, then reach will be None . reach ,
frequency , and rf_spend must contain the same number of media channels
in the same order. If any of these arguments is passed, then the others
are not optional.
|
frequency
|
An optional DataArray of dimensions (n_geos, n_media_times,
n_rf_channels) containing non-negative frequency values. It is required
that n_media_times ≥ n_times , and the final n_times time periods
must align with the time window of kpi and controls . The time window
must include the time window of the kpi and controls data, but it is
optional to include lagged time periods prior to the time window of the
kpi and controls data. If lagged frequency is not included, or if the
lagged frequency includes fewer than max_lag time periods, then the
model calculates Adstock assuming that frequency execution is zero prior
to the first observed time period. We recommend including n_times +
max_lag time periods, unless the value of max_lag is prohibitively
large. If only media data is used, then frequency will be None .
reach , frequency , and rf_spend must contain the same number of media
channels in the same order. If any of these arguments is passed, then the
others are not optional.
|
rf_spend
|
An optional DataArray containing the cost of each reach and
frequency channel. This is used as the denominator for ROI calculations.
It is also used to calculate an assumed cost per media unit for
post-modeling analysis such as response curves and budget optimization.
Only the aggregate spend (across geos and time periods) is required for
these calculations. However, a spend breakdown by geo and time period is
required if rf_roi_calibration_period is specified or if conducting
post-modeling analysis on a specific subset of geos and/or time periods.
The DataArray shape can be (n_rf_channels,) or (n_geos, n_times,
n_rf_channels) . The spend should be aggregated over geo and/or time
dimensions that are not represented. We recommend that the spend total
aligns with the time window of the kpi and controls data, which is the
time window over which incremental outcome of the ROI numerator is
calculated. However, note that incremental outcome is influenced by media
execution prior to this time window, through lagged effects, and excludes
lagged effects beyond the time window of media executed during the time
window. If only media data is used, rf_spend will be None . reach ,
frequency , and rf_spend must contain the same number of media channels
in the same order. If any of these arguments is passed, then the others
are not optional. If a tensor of shape (n_rf_channels,) is passed as
rf_spend , then it will be automatically allocated across geos and times
proportionally to (reach * frequency) .
|
organic_media
|
An optional DataArray of dimensions (n_geos,
n_media_times, n_organic_media_channels) containing non-negative organic
media values. Organic media variables are media activities that have no
direct cost. These may include impressions from newsletters, a blog post,
social media activity or email campaigns but it can be any metric, such as
clicks. n_media_times ≥ n_times is required, and the final n_times
time periods must align with the time window of kpi and controls . Due
to lagged effects, we recommend that the time window for organic media
includes up to max_lag additional periods prior to this window. If
n_organic_media_times < n_times + max_lag , the model effectively
imputes organic media history. If n_organic_media_times > n_times +
max_lag , then only the final n_times + max_lag periods are used to
fit the model.
|
organic_reach
|
An optional DataArray of dimensions (n_geos,
n_media_times, n_organic_rf_channels) containing non-negative organic
reach values. It is required that n_media_times ≥ n_times , and the
final n_times time periods must align with the time window of kpi and
controls . The time window must include the time window of the kpi and
controls data, but it is optional to include lagged time periods prior
to the time window of the kpi and controls data. If lagged reach is
not included, or if the lagged reach includes fewer than max_lag time
periods, then the model calculates Adstock assuming that reach execution
is zero prior to the first observed time period. We recommend including
n_times + max_lag time periods, unless the value of max_lag is
prohibitively large. If no organic reach and frequency data is used, then
organic_reach and organic_frequency will be None . organic_reach ,
and organic_frequency must contain the same number of channels in the
same order. If any of these arguments is passed, then the other is not
optional.
|
organic_frequency
|
An optional DataArray of dimensions (n_geos,
n_media_times, n_organic_rf_channels) containing non-negative organic
frequency values. It is required that n_media_times ≥ n_times , and the
final n_times time periods must align with the time window of kpi and
controls . The time window must include the time window of the kpi and
controls data, but it is optional to include lagged time periods prior
to the time window of the kpi and controls data. If lagged frequency
is not included, or if the lagged frequency includes fewer than max_lag
time periods, then the model calculates Adstock assuming that frequency
execution is zero prior to the first observed time period. We recommend
including n_times + max_lag time periods, unless the value of
max_lag is prohibitively large. If no organic reach and frequency data
is used, then organic_frequency will be None . organic_reach and
organic_frequency must contain the same number of channels in the same
order. If any of these arguments is passed, then the other is not
optional.
|
non_media_treatments
|
An optional DataArray of dimensions (n_geos, n_times,
n_non_media_channels) containing non-media treatment variables values.
Non-media treatment variables are marketing activities taken by the
advertiser not directly related to media. They have no direct marketing
cost associated with them but unlike organic media variables there are no
Adstock and Hill effects. They differ from control variables as they are
considered to be intervenable and hence are treatment variables under the
causal model. Some examples include running a promotion, the price of a
product and a change in a product's packaging and/or design.
|
allocated_media_spend
|
Returns the allocated media spend for each geo and time.
|
allocated_rf_spend
|
Returns the allocated RF spend for each geo and time.
|
control_variable
|
Returns the control variable dimension.
|
geo
|
Returns the geo dimension.
|
media_channel
|
Returns the media channel dimension.
|
media_spend_has_geo_dimension
|
Checks whether the media_spend array has a geo dimension.
|
media_spend_has_time_dimension
|
Checks whether the media_spend array has a time dimension.
|
media_time
|
Returns the media time dimension coordinates.
|
media_time_coordinates
|
Returns the media time dimension in a TimeCoordinates wrapper.
|
non_media_channel
|
Returns the non-media treatments channel dimension.
|
organic_media_channel
|
Returns the organic media channel dimension.
|
organic_rf_channel
|
Returns the organic RF channel dimension.
|
rf_channel
|
Returns the RF channel dimension.
|
rf_spend_has_geo_dimension
|
Checks whether the rf_spend array has a geo dimension.
|
rf_spend_has_time_dimension
|
Checks whether the rf_spend array has a time dimension.
|
scaled_centered_kpi
|
Calculates scaled and centered KPI values.
mean-centered by geo.
|
time
|
Returns the time dimension coordinates.
|
time_coordinates
|
Returns the (KPI) time dimension in a TimeCoordinates wrapper.
|
Methods
View source
aggregate_media_spend(
calibration_period: (np.ndarray | None) = None
) -> (np.ndarray | None)
Aggregates media spend by channel over the calibration period.
aggregate_rf_spend
View source
aggregate_rf_spend(
calibration_period: (np.ndarray | None) = None
) -> (np.ndarray | None)
Aggregates RF spend by channel over the calibration period.
as_dataset
View source
as_dataset() -> xr.Dataset
Returns data as a single xarray.Dataset
object.
copy
View source
copy(
deep: bool = True
) -> 'InputData'
Returns a copy of the InputData instance.
Args |
deep
|
If True, a deep copy is made, meaning all xarray.DataArray objects
are also deepcopied. If False, a shallow copy is made.
|
Returns |
A new InputData instance.
|
get_all_adstock_hill_channels
View source
get_all_adstock_hill_channels() -> np.ndarray
Returns all channel dimensions that adstock hill is applied to.
RF, organic media and organic RF channels are concatenated to the end of the
media channels if they are present.
get_all_channels
View source
get_all_channels() -> np.ndarray
Returns all the channel dimensions.
This method returns media, RF, organic media, organic RF and non-media
channel names, concatenated into a single array in that order.
View source
get_all_media_and_rf() -> np.ndarray
Returns all of the media execution values, including both media and RF.
If media, reach, and frequency were used for modeling, reach * frequency
is concatenated to the end of media.
Returns |
np.ndarray with dimensions (n_geos, n_media_times, n_channels)
containing media or reach * frequency for each media_channel or
rf_channel .
|
get_all_paid_channels
View source
get_all_paid_channels() -> np.ndarray
Returns all the paid channel dimensions, including both media and RF.
If both media and RF channels are present, then the RF channels are
concatenated to the end of the media channels.
get_n_top_largest_geos
View source
get_n_top_largest_geos(
num_geos: int
) -> list[str]
Finds the specified number of the largest geos by population.
Args |
num_geos
|
The number of top largest geos to return based on population.
|
Returns |
A list of the specified number of top largest geos.
|
View source
get_organic_media_channels_argument_builder() -> meridian.data.arg_builder.OrderedListArgumentBuilder
Returns an argument builder for organic media channels only.
get_organic_rf_channels_argument_builder
View source
get_organic_rf_channels_argument_builder() -> meridian.data.arg_builder.OrderedListArgumentBuilder
Returns an argument builder for organic RF channels only.
get_paid_channels_argument_builder
View source
get_paid_channels_argument_builder() -> meridian.data.arg_builder.OrderedListArgumentBuilder
Returns an argument builder for all paid channels.
View source
get_paid_media_channels_argument_builder() -> meridian.data.arg_builder.OrderedListArgumentBuilder
Returns an argument builder for paid media channels only.
get_paid_rf_channels_argument_builder
View source
get_paid_rf_channels_argument_builder() -> meridian.data.arg_builder.OrderedListArgumentBuilder
Returns an argument builder for paid RF channels only.
get_total_outcome
View source
get_total_outcome() -> np.ndarray
Returns total outcome, aggregated over geos and times.
get_total_spend
View source
get_total_spend() -> np.ndarray
Returns total spend, including media_spend
and rf_spend
.
__eq__
__eq__(
other
)
Return self==value.
Class Variables |
controls
|
None
|
frequency
|
None
|
media
|
None
|
media_spend
|
None
|
non_media_treatments
|
None
|
organic_frequency
|
None
|
organic_media
|
None
|
organic_reach
|
None
|
reach
|
None
|
revenue_per_kpi
|
None
|
rf_spend
|
None
|