Google Transport price accuracy crawlers

This document is intended to describe the traffic from Google Transport price accuracy crawlers.

Note on the number of queries

For example, if we agreed to send 5000 queries per day, it means that 5000 times per day (evenly distributed across the day, that is approximately one every 17 seconds), our crawler performs all of the following actions a regular user would perform:

  • start from Google Search, and click the partner link

  • select the intended travel itinerary (if not already selected)

  • click 'continue' until it reaches the page where the user would have to enter personal / payment details

  • read final price details from the page

The crawler filters fetched resources

The crawler only fetches the resources that are required to get the information we are interested in price and availability details. In particular, it means that usually it only fetches resources from the partner website (i.e. we only authorize URLs from the same domain). Additionally we avoid fetching any resources that are not required to read the correct price data such as images.

In particular, it means the crawler doesn't load and execute scripts from third parties (Google Analytics, Facebook, Criteo...), so the crawler traffic should be excluded from those analytics.

Caching

For purposes of reducing load on the partner website, our crawlers are generally configured to respect all standard http caching headers present in the response. That means that for correctly configured websites we avoid repeatedly fetching content that changes rarely (e.g. JavaScript libraries).

Troubleshooting

The correct operation of our quality checks of our crawler network depends on having access to the partner website. The information to do so can be found in this help center article.