If you have enabled monitoring, data will be saved to a BigQuery table for logging and retry purposes.
You may wish to periodically delete data from this table to reduce costs and to comply with data privacy regulations.
To do this, you can run the
2. Running Cleanup Manually
To run the cleanup DAG manually, navigate to the DAGs section of Airflow and click on the play button. By default, all data older than 50 days will be removed from the monitoring table.
3. Running Cleanup on a Schedule
Cleanup can also be set to run repeatedly on a schedule.
For information on how to do this, see section "3.3.2 Schedule a DAG" in the Installation Guide.
4. Running Cleanup After Another DAG
Cleanup can also be set to run automatically after another DAG finishes.
To enable this, set up an Airflow variable (Admin -> Variables -> Create) with the following format:
Set the value of this variable 1 to run cleanup automatically after this DAG finishes, or set it to 0 to turn off automatic cleanup.
Note that there is no need to create this variable when running from
monitoring_cleanup_dag, since this DAG will run the cleanup operation by
5. Customizing Days To Live
By default, the cleanup DAG will remove all data from the BigQuery monitoring
table that is older than 50 days. You can customize the number of days data can
live in the BigQuery table before the cleanup operator will delete it by setting
monitoring_data_days_to_live to the number of days you want data