meridian.data.data_frame_input_data_builder.DataFrameInputDataBuilder
Stay organized with collections
Save and categorize content based on your preferences.
Builds InputData
from DataFrames.
Inherits From: InputDataBuilder
meridian.data.data_frame_input_data_builder.DataFrameInputDataBuilder(
kpi_type: str,
default_geo_column: str = constants.GEO,
default_time_column: str = constants.TIME,
default_media_time_column: str = constants.TIME,
default_population_column: str = constants.POPULATION,
default_kpi_column: str = constants.KPI,
default_revenue_per_kpi_column: str = constants.REVENUE_PER_KPI
)
Attributes |
controls
|
|
default_geo_column
|
The default geo column name for this builder to use.
This column name is used when geo_col is not explicitly provided to a data
setter method.
By default, this is "geo" .
|
default_kpi_column
|
The default kpi column name for this builder to use.
This column name is used when kpi_col is not explicitly provided to a data
setter method.
By default, this is "kpi" .
|
default_media_time_column
|
The default media time column name for this builder to use.
This column name is used when media_time_col is not explicitly provided to
a data setter method.
By default, this is also "time" , since most input dataframes are likely
to use the same time column for both their media execution and media spend
data.
|
default_population_column
|
The default population column name for this builder to use.
This column name is used when population_col is not explicitly provided to
a data setter method.
By default, this is "population" .
|
default_revenue_per_kpi_column
|
The default revenue per kpi column name for this builder to use.
This column name is used when revenue_per_kpi_col is not explicitly
provided to a data setter method.
By default, this is "revenue_per_kpi" .
|
default_time_column
|
The default time column name for this builder to use.
This column name is used when time_col is not explicitly provided to a
data setter method.
By default, this is "time" .
|
frequency
|
|
geos
|
|
kpi
|
|
media
|
|
media_spend
|
|
media_time_coords
|
|
non_media_treatments
|
|
organic_frequency
|
|
organic_media
|
|
organic_reach
|
|
population
|
|
reach
|
|
revenue_per_kpi
|
|
rf_spend
|
|
time_coords
|
|
Methods
build
View source
build() -> meridian.data.input_data.InputData
Builds an InputData
.
Constructs an InputData
from constituent DataArray
s given to this
builder thus far after performing one final validation pass over all data
arrays for consistency checks.
Returns |
A validated InputData .
|
with_controls
View source
with_controls(
df: pd.DataFrame,
control_cols: list[str],
time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads controls data from a DataFrame.
Args |
df
|
The DataFrame to read the controls data from.
|
control_cols
|
The names of the columns containing the controls values.
|
time_col
|
The name of the column containing the time coordinates. If not
provided, self.default_time_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added controls data.
|
with_kpi
View source
with_kpi(
df: pd.DataFrame,
kpi_col: (str | None) = None,
time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads KPI data from a DataFrame.
Args |
df
|
The DataFrame to read the KPI data from.
|
kpi_col
|
The name of the column containing the KPI values. If not
provided, self.default_kpi_column is used.
|
time_col
|
The name of the column containing the time coordinates. If not
provided, self.default_time_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added KPI data.
|
View source
with_media(
df: pd.DataFrame,
media_cols: list[str],
media_spend_cols: list[str],
media_channels: list[str],
time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads media and media spend data from a DataFrame.
Args |
df
|
The DataFrame to read the media and media spend data from.
|
media_cols
|
The name of the columns containing the media values.
|
media_spend_cols
|
The name of the columns containing the media spend
values.
|
media_channels
|
The desired media channel coordinate names. Must match
media_cols and media_spend_cols in length. These are also index
mapped.
|
time_col
|
The name of the column containing the time coordinates for media
spend and media time coordinates for media. If not provided,
self.default_time_column is used. Media time coordinates are inferred
from the same time_col and are potentially shorter than time
coordinates if media spend values are missing (NaN) for some t in
time . Media time must be equal or a subset of time.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added media and media spend data.
|
View source
with_non_media_treatments(
df: pd.DataFrame,
non_media_treatment_cols: list[str],
time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads non-media treatments data from a DataFrame.
Args |
df
|
The DataFrame to read the non-media treatments data from.
|
non_media_treatment_cols
|
The names of the columns containing the
non-media treatments values.
|
time_col
|
The name of the column containing the time coordinates. If not
provided, self.default_time_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added non-media treatments data.
|
View source
with_organic_media(
df: pd.DataFrame,
organic_media_cols: list[str],
organic_media_channels: (list[str] | None) = None,
media_time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads organic media data from a DataFrame.
Args |
df
|
The DataFrame to read the organic media data from.
|
organic_media_cols
|
The name of the columns containing the organic media
values.
|
organic_media_channels
|
The desired organic media channel coordinate
names. Will default to the organic media columns if not given. If
provided, must match organic_media_cols in length. This is index
mapped.
|
media_time_col
|
The name of the column containing the media time
coordinates. If not provided, self.default_media_time_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added organic media data.
|
with_organic_reach
View source
with_organic_reach(
df: pd.DataFrame,
organic_reach_cols: list[str],
organic_frequency_cols: list[str],
organic_rf_channels: list[str],
media_time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads organic reach and organic frequency data from a DataFrame.
Args |
df
|
The DataFrame to read the organic reach and frequency data from.
|
organic_reach_cols
|
The name of the columns containing the organic reach
values.
|
organic_frequency_cols
|
The name of the columns containing the organic
frequency values.
|
organic_rf_channels
|
The desired organic rf channel coordinate names. Must
match organic_reach_cols and organic_frequency_cols in length. These
are also index mapped.
|
media_time_col
|
The name of the column containing the media time
coordinates. If not provided, self.default_media_time_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added organic reach and organic
frequency data.
|
with_population
View source
with_population(
df: pd.DataFrame,
population_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads population data from a DataFrame.
Args |
df
|
The DataFrame to read the population data from.
|
population_col
|
The name of the column containing the population values.
If not provided, self.default_population_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added population data.
|
with_reach
View source
with_reach(
df: pd.DataFrame,
reach_cols: list[str],
frequency_cols: list[str],
rf_spend_cols: list[str],
rf_channels: list[str],
time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads reach, frequency, and rf spend data from a DataFrame.
Args |
df
|
The DataFrame to read the reach, frequency, and rf spend data from.
|
reach_cols
|
The name of the columns containing the reach values.
|
frequency_cols
|
The name of the columns containing the frequency values.
|
rf_spend_cols
|
The name of the columns containing the rf spend values.
|
rf_channels
|
The desired rf channel coordinate names. Must match
reach_cols , frequency_cols , and rf_spend_cols in length. These are
also index mapped.
|
time_col
|
The name of the column containing the time coordinates for rf
spend and media time coordinates for reach and frequency. If not
provided, self.default_time_column is used. Media time coordinates are
inferred from the same time_col and are potentially shorter than time
coordinates if media spend values are missing (NaN) for some t in
time . Media time must be equal or a subset of time.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added reach, frequency, and rf
spend data.
|
with_revenue_per_kpi
View source
with_revenue_per_kpi(
df: pd.DataFrame,
revenue_per_kpi_col: (str | None) = None,
time_col: (str | None) = None,
geo_col: (str | None) = None
) -> 'DataFrameInputDataBuilder'
Reads revenue per KPI data from a DataFrame.
Args |
df
|
The DataFrame to read the revenue per KPI data from.
|
revenue_per_kpi_col
|
The name of the column containing the revenue per KPI
values. If not provided, self.default_revenue_per_kpi_column is used.
|
time_col
|
The name of the column containing the time coordinates. If not
provided, self.default_time_column is used.
|
geo_col
|
(Optional) The name of the column containing the geo coordinates.
If not provided, self.default_geo_column is used. If the DataFrame
provided has no geo column, a national model data is assumed and a geo
dimension will be created internally with a single coordinate value
national_geo .
|
Returns |
The DataFrameInputDataBuilder with the added revenue per KPI data.
|