Monitoring your infrastructure

Once your server-side tagging setup has been upgraded to production-ready status, you will need to take proactive steps to monitor the deployment.

Even though you should periodically check the status of your deployment, critical checkpoints are:

  • When you first deploy the environment and the following days/weeks to make sure you have enough computation available for the incoming traffic.
  • Before known traffic peaks, such as when seasonal campaigns begin or before a major launch.

Three useful monitoring opportunities in the Google Cloud Platform are with the App Engine reports, Cloud Logging, and Billing reports.

App Engine reports

If you are running your server-side tagging environment on Google App Engine, you can access some very useful reports in the App Engine section of the Google Cloud Platform console.

In the Instances subsection, you can select Instances from the Chart Settings drop-down. This will show you a handy report about the overall health of your instances:

Screenshot of an App Engine report in Google Cloud Platform
  1. Pick an interval you want to see statistics for.
  2. The Instances card shows you how many instances were deployed at a given time. If the number of instances falls below your configured minimum, your selected Google Cloud region might have resource availability issues.
  3. The Autoscaling card shows you how many instances get created based on the total CPU usage. You can observe how App Engine creates more instances when your instances use more than 0.6 (60%) of their CPU. If you have three instances, the total CPU usage should be below 1.8. If you see the total CPU usage constantly near or over the target threshold, it means that the minimum instance count is not enough.
  4. The list of Instances shows you which instances are currently being used. Make sure that they are all Healthy.

If any of these graphs or reports give you cause for concern, a good recovery step is to run the shell script again, even if you just keep the same settings. This will force a redeployment of your server-side tagging environment.

Sometimes the Google Cloud Region your application is running in (us-central1 by default) can have issues with resource availability. A good place to verify these is via Cloud Logging (see below) as well as the Google Cloud Service Health dashboard.

Cloud Logging

App Engine logs a lot of useful information about your environment health automatically into a Google Cloud service called Cloud Logging.

To explore your service logs, head on over to Logs Explorer.

Screenshot of a dashboard to evaluate logs.
  1. Pick a date range you want to see statistics for.
  2. Enter a query to filter the logs (you can find useful queries below).
  3. Toggle the Histogram on to quickly see what grades of severity your logs comprise (e.g. Info, Warning, Error).
  4. Click any log in the list to see more details about the entry.

Here are some useful queries to help you monitor the setup:

Filtering for system errors

The severity = "ERROR" query filters log entries that are classified as service errors. This is a useful query whenever you suspect there to be a service disruption or something else wrong with your instances. For example, a log entry with severity ERROR could have the following description: ZONE_RESOURCE_POOL_EXHAUSTED. This error means that the Google Cloud Platform was unable to provide you with the instances you required.

Filtering for faulty HTTP requests

The httpRequest.status >= 400 query filters the logs to show HTTP requests that server-side Tag Manager responded to with an error status (HTTP status 400 or higher).

400 errors typically mean that a request was sent to your server-side tagging endpoint but there was no client to claim the request. These could be requests from bots and crawlers that can be ignored. However, if your data streams result in a 400 error, you need to take a look at the configuration of your server container clients.

5XX errors indicate an issue with the Google Cloud service itself. They could indicate failed health checks or load balancer problems. If your environment is suffering from resource exhaustion issues, a typical symptom is that you'll start seeing more of these 5XX errors, too.

Filtering standard output logs

The query logName: "stdout" returns the standard output log of your application. It's a useful log to follow in case your server-side resources (clients, tags, and variables) use the logToConsole sandbox API to log relevant information.

Filtering incoming HTTP requests

logName: "logs/requests" returns logs are for the actual incoming HTTP requests themselves. You will see them only if you have not disabled request logging when running the shell script or by adding other log filters.

You can expand the request details by clicking on the relevant row in the list, but note that the HTTP request body is not available in logs. You will only see the request URL and additional metadata. If the request is sent with the POST method, for example, where the contents are encoded in the request body, those contents can not be explored in the logs.

Congratulations on finishing this learning pathway!

Please take the survey to help us improve the course.

What to do now?

In case you still need to configure additional tags, check out the server-side tagging docs:

Where to find support?

If you need more support with implementing your measurement tags, you can reach out to our certified partners or ask the community.