meridian.data.data_frame_input_data_builder.DataFrameInputDataBuilder

Builds InputData from DataFrames.

Inherits From: InputDataBuilder

controls

default_geo_column The default geo column name for this builder to use.

This column name is used when geo_col is not explicitly provided to a data setter method.

By default, this is "geo".

default_kpi_column The default kpi column name for this builder to use.

This column name is used when kpi_col is not explicitly provided to a data setter method.

By default, this is "kpi".

default_media_time_column The default media time column name for this builder to use.

This column name is used when media_time_col is not explicitly provided to a data setter method.

By default, this is also "time", since most input dataframes are likely to use the same time column for both their media execution and media spend data.

default_population_column The default population column name for this builder to use.

This column name is used when population_col is not explicitly provided to a data setter method.

By default, this is "population".

default_revenue_per_kpi_column The default revenue per kpi column name for this builder to use.

This column name is used when revenue_per_kpi_col is not explicitly provided to a data setter method.

By default, this is "revenue_per_kpi".

default_time_column The default time column name for this builder to use.

This column name is used when time_col is not explicitly provided to a data setter method.

By default, this is "time".

frequency

geos

kpi

media

media_spend

media_time_coords

non_media_treatments

organic_frequency

organic_media

organic_reach

population

reach

revenue_per_kpi

rf_spend

time_coords

Methods

build

View source

Builds an InputData.

Constructs an InputData from constituent DataArrays given to this builder thus far after performing one final validation pass over all data arrays for consistency checks.

Returns
A validated InputData.

with_controls

View source

Reads controls data from a DataFrame.

Args
df The DataFrame to read the controls data from.
control_cols The names of the columns containing the controls values.
time_col The name of the column containing the time coordinates. If not provided, self.default_time_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added controls data.

with_kpi

View source

Reads KPI data from a DataFrame.

Args
df The DataFrame to read the KPI data from.
kpi_col The name of the column containing the KPI values. If not provided, self.default_kpi_column is used.
time_col The name of the column containing the time coordinates. If not provided, self.default_time_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added KPI data.

with_media

View source

Reads media and media spend data from a DataFrame.

Args
df The DataFrame to read the media and media spend data from.
media_cols The name of the columns containing the media values.
media_spend_cols The name of the columns containing the media spend values.
media_channels The desired media channel coordinate names. Must match media_cols and media_spend_cols in length. These are also index mapped.
time_col The name of the column containing the time coordinates for media spend and media time coordinates for media. If not provided, self.default_time_column is used. Media time coordinates are inferred from the same time_col and are potentially shorter than time coordinates if media spend values are missing (NaN) for some t in time. Media time must be equal or a subset of time.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added media and media spend data.

with_non_media_treatments

View source

Reads non-media treatments data from a DataFrame.

Args
df The DataFrame to read the non-media treatments data from.
non_media_treatment_cols The names of the columns containing the non-media treatments values.
time_col The name of the column containing the time coordinates. If not provided, self.default_time_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added non-media treatments data.

with_organic_media

View source

Reads organic media data from a DataFrame.

Args
df The DataFrame to read the organic media data from.
organic_media_cols The name of the columns containing the organic media values.
organic_media_channels The desired organic media channel coordinate names. Will default to the organic media columns if not given. If provided, must match organic_media_cols in length. This is index mapped.
media_time_col The name of the column containing the media time coordinates. If not provided, self.default_media_time_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added organic media data.

with_organic_reach

View source

Reads organic reach and organic frequency data from a DataFrame.

Args
df The DataFrame to read the organic reach and frequency data from.
organic_reach_cols The name of the columns containing the organic reach values.
organic_frequency_cols The name of the columns containing the organic frequency values.
organic_rf_channels The desired organic rf channel coordinate names. Must match organic_reach_cols and organic_frequency_cols in length. These are also index mapped.
media_time_col The name of the column containing the media time coordinates. If not provided, self.default_media_time_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added organic reach and organic frequency data.

with_population

View source

Reads population data from a DataFrame.

Args
df The DataFrame to read the population data from.
population_col The name of the column containing the population values. If not provided, self.default_population_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added population data.

with_reach

View source

Reads reach, frequency, and rf spend data from a DataFrame.

Args
df The DataFrame to read the reach, frequency, and rf spend data from.
reach_cols The name of the columns containing the reach values.
frequency_cols The name of the columns containing the frequency values.
rf_spend_cols The name of the columns containing the rf spend values.
rf_channels The desired rf channel coordinate names. Must match reach_cols, frequency_cols, and rf_spend_cols in length. These are also index mapped.
time_col The name of the column containing the time coordinates for rf spend and media time coordinates for reach and frequency. If not provided, self.default_time_column is used. Media time coordinates are inferred from the same time_col and are potentially shorter than time coordinates if media spend values are missing (NaN) for some t in time. Media time must be equal or a subset of time.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added reach, frequency, and rf spend data.

with_revenue_per_kpi

View source

Reads revenue per KPI data from a DataFrame.

Args
df The DataFrame to read the revenue per KPI data from.
revenue_per_kpi_col The name of the column containing the revenue per KPI values. If not provided, self.default_revenue_per_kpi_column is used.
time_col The name of the column containing the time coordinates. If not provided, self.default_time_column is used.
geo_col (Optional) The name of the column containing the geo coordinates. If not provided, self.default_geo_column is used. If the DataFrame provided has no geo column, a national model data is assumed and a geo dimension will be created internally with a single coordinate value national_geo.

Returns
The DataFrameInputDataBuilder with the added revenue per KPI data.