GA DAGs

Overview

The page will guide you how to configure tcrm_bq_to_ga or tcrm_gcs_to_ga DAG and how to prepare the data.

Google Analytics is used to track website activity such as session duration, pages per session, bounce rate etc. of individuals using the site, along with the information on the source of the traffic.

For detail, please refer to Measurement Protocol Overview

Configure Airflow Variables

Create New Necessary tcrm_bq_to_ga DAG Variables

The following table indicates which variables are needed to run the tcrm_bq_to_ga DAG. You only need to set up these variables if you plan to use BigQuery as your data source.

Variable Name Default Value Variable Information
bq_dataset_id my_dataset The name of the BigQuery dataset containing the data.
bq_table_id my_table The name of the BigQuery table containing the data.
ga_tracking_id UA-123456789-1 Google Analytics Tracking ID

Create New Necessary tcrm_gcs_to_ga DAG Variables

The following table indicates which variables are needed to run the tcrm_gcs_to_ga DAG. You only need to set up these variables if you plan to use Google Cloud Storage as your data source.

Variable Name Example Value Variable Information
gcs_bucket_name my_bucket Cloud Storage bucket name.
gcs_bucket_prefix folder/sub_folder The path to the data folder inside the bucket.
gcs_content_type (optional) Either JSON or CSV. Cloud Storage content type.
ga_tracking_id UA-123456789-1 Google Analytics Tracking ID

Prepare Data to Send to Google Analytics

NOTE: Refer to the Measurement Protocol API{target="_blank"} for the detailed requirements.

To send your data to GA you can choose from the following 3 options:

  1. From BigQuery using the tcrm_bq_to_ga DAG in SQL table Format.

  2. From Google Cloud Storage using the tcrm_gcs_to_ga DAG in JSON Format.

    {"cid": "12345.67890", "t":"event", "ec": "video", "ea": "play", "el": "holiday", "ev": "300" }
    {"cid": "12345.67891", "t":"event", "ec": "video", "ea": "play", "el": "holiday", "ev": "301" }
    {"cid": "12345.67892", "t":"event", "ec": "video", "ea": "play", "el": "holiday", "ev": "302" }
    {"cid": "12345.67893", "t":"event", "ec": "video", "ea": "play", "el": "holiday", "ev": "303" }
    
  3. From Google Cloud Storage using the tcrm_gcs_to_ga DAG in CSV Format.

    cid,t,ec,ea,el,ev
    12345.67890,event,video,play,holiday,300
    12345.67891,event,video,play,holiday,301
    12345.67892,event,video,play,holiday,302
    12345.67893,event,video,play,holiday,303
    

WARNING: To make sure GA will accept the data sent from TCRM you would need to configure GA's bot filtering. To do this, go to Admin -> View Settings -> Bot Filtering in your Google Analytics UI{target="_blank"} and uncheck “Exclude all hits from known bots and spiders”.

Run Your DAG

In the Airflow console click on the DAGs option from the top menu bar. Find the DAG you’d like to run in the list on the left. Then run it by clicking the Play button on the right side of the list.

Reading DAG's Logs

Please refer to Reading DAG's Logs in FAQ.