Google Transport price accuracy crawlers

AI-generated Key Takeaways

This document details the traffic generated by Google Transport's price accuracy crawlers.
Crawlers mimic user actions, including navigating through Google Search, selecting itineraries, and reaching the final price details page.
Crawlers only fetch essential resources from the partner website needed for price and availability, excluding unnecessary elements like images or third party scripts.
Crawler traffic should be excluded from analytics services like Google Analytics, Facebook, and Criteo as it will not trigger them.
Crawlers respect HTTP caching headers to reduce load on partner websites by avoiding repeated fetches of static content.

This document is intended to describe the traffic from Google Transport price accuracy crawlers.

Note on the number of queries

For example, if we agreed to send 5000 queries per day, it means that 5000 times per day (evenly distributed across the day, that is approximately one every 17 seconds), our crawler performs all of the following actions a regular user would perform:

start from Google Search, and click the partner link
select the intended travel itinerary (if not already selected)
click 'continue' until it reaches the page where the user would have to enter personal / payment details
read final price details from the page

The crawler filters fetched resources

The crawler only fetches the resources that are required to get the information we are interested in price and availability details. In particular, it means that usually it only fetches resources from the partner website (i.e. we only authorize URLs from the same domain). Additionally we avoid fetching any resources that are not required to read the correct price data such as images.

In particular, it means the crawler doesn't load and execute scripts from third parties (Google Analytics, Facebook, Criteo...), so the crawler traffic should be excluded from those analytics.

Caching

For purposes of reducing load on the partner website, our crawlers are generally configured to respect all standard http caching headers present in the response. That means that for correctly configured websites we avoid repeatedly fetching content that changes rarely (e.g. JavaScript libraries).

Troubleshooting

The correct operation of our quality checks of our crawler network depends on having access to the partner website. The information to do so can be found in this help center article.

Google Transport price accuracy crawlers Stay organized with collections Save and categorize content based on your preferences.