Batch ingestion

Your data feeds let you make your restaurant, services, and menu available in Order with Google.

This document covers how to host your sandbox and production inventories and use batch ingestion to update your inventory in Order with Google.

Data feed environments

There are three data feed environments available for your integration development:

Feed environment Description Batch ingestion
Preview Feed This environment is used for early stage feed development and debugging. It lets you upload a single feed file and test it in the Preview Tool. Not applicable
Sandbox Feed The test environment for your feed development. Required
Production Feed The production environment for your inventory that you want to launch. Required

Hosting data feeds

In order for Order with Google to process your Sandbox and Production data feeds by batch ingestion, you must host your data feed files in Google Cloud Storage, Amazon S3, or HTTPS with a sitemap.

We recommend that you host the data feeds for your sandbox and production environments separately. This approach lets you do development and testing in your sandbox feed environment before you deploy the changes to production.

For example, if you use Google Cloud Storage as a hosting option, you would have the following paths:

  • Sandbox Feed: gs://foorestaurant-google-feed-sandbox/
  • Production Feed: gs://foorestaurant-google-feed-prod/

To host your inventory, do the following:

  1. Generate your data feed files.
  2. Choose a hosting solution.
  3. Host your data feeds.
  4. Ensure that your data feed files are updated regularly. Production data feeds must be updated daily.

For details on how to create an inventory feed, see the documentation for the Restaurant, Service, and Menu entities, as well as the Create a data feed section.

Guidelines on data feed files

Each file, which can contain multiple entities, must not exceed 200 MB. The top-level entities Restaurant, Service, and Menu, along with their child entities, must not exceed 4 MB all together.

Choose a hosting solution

The following table lists the options for hosting your data feeds and how those hosts work with Order with Google:

Amazon S3 Google Cloud Storage HTTPS with a sitemap
Credentials and access

Provide Google with the following information:

  • Access key ID
  • Secret access key
  • The paths to your production and sandbox S3 directories and marker.txt file. The path must begin with s3://.

The S3 bucket needs to include the following information:

  • Feed files for your inventory.
  • marker.txt, which contains a timestamp used for fetching.

Example marker.txt file: 2018-12-03T08:30:42.694Z

Provide Google with the paths to your production and sandbox bucket directories and marker.txt file. The paths must begin with gs://.

Add the service account provided by your Google consultant as a reader of your Google Cloud Storage bucket.

For more information on how to control access for Google Cloud Storage (GCS), see Google Cloud Platform Console: Setting bucket permissions.

The GCS bucket needs to include the following information:

  • Feed files for your inventory.
  • marker.txt, which contains a timestamp used for fetching.

Example marker.txt file: 2018-12-03T08:30:42.694Z

Provide Google with the following information:

  • Credentials to your basic auth.
  • The path to your production and sandbox sitemap paths. The path must begin with https://.
  • Protocol: You must make your feed files available through HTTPS, not HTTP.
  • Security: Google strongly recommends that you protect your hosted feed files with Basic Authentication.
How Google knows which files need to be fetched Directory listing of all files in the bucket. Directory listing of all files in the bucket. Individual URLs of files listed in the sitemap.
How Google knows that files are ready to fetch After you finish generating your data feeds, update the marker.txt file with the latest timestamp. After you finish generating your data feeds, update the marker.txt file with the latest timestamp. After you finish generating your data feeds, update the response header last-modified of your sitemap.xml with the latest timestamp.
File limits

Maximum number of files: 100,000.

You must have less than 100,000 files total in your Amazon S3 bucket.

Maximum number of files: 100,000.

You must have less than 100,000 files total in your Google Cloud Storage bucket.

Maximum number of files: 100,000.

The number of file paths within your sitemap XML file must be less than 100,000.

Connect your data feeds for batch ingestion

After your feeds are hosted, you need to connect them to your Actions project.

If you host your data feeds with Amazon S3

  1. In the Actions Console, go to Develop > Data feeds.
  2. Click New Data Feed and fill out the following fields:

    • Data feed name: The name of your data feed. Example: [Project Name] Sandbox Feed.
    • Environment: To host your sandbox inventory, set to Sandbox. To host your production inventory, set to Production.
    • Data feed source: Set to Amazon S3.
    • Data feed configuration: Under Data feed endpoint, enter the URL where your data feeds are hosted.
    • Modification marker configuration: Enter the Market endpoint where your marker.txt file is hosted.
    • Server configuration: Enter the Access key ID and Secret access key that you configured.
  3. Click Create Feed.
  4. After one to two hours, check if batch ingestion fetches your feed files.

If you host your data feeds with Google Cloud Storage

  1. In the Actions Console, go to Develop > Data feeds.
  2. Click New Data Feed and fill out the following fields:

    • Data feed name: The name of your data feed. Example: [Project Name] Sandbox Feed.
    • Environment: To host your sandbox inventory, set to Sandbox. To host your production inventory, set to Production.
    • Data feed source: Set to Google Cloud Storage.
    • Data feed configuration: Under Data feed endpoint, enter the URL where your data feeds are hosted.
    • Modification marker configuration: Enter the Market endpoint where your marker.txt file is hosted.
  3. Click Create Feed.
  4. A service account will be created to access your storage bucket. Be sure to add the service account to your Google Cloud Storage and grant it "Reader" permission.
  5. After one to two hours, check if batch ingestion fetches your feed files.

If you host your data feeds with HTTPS

  1. In the Actions Console, go to Develop > Data feeds.
  2. Click New Data Feed and fill out the following fields:

    • Data feed name: The name of your data feed. Example: [Project Name] Sandbox Feed.
    • Environment: To host your sandbox inventory, set to Sandbox. To host your production inventory, set to Production.
    • Data feed source: Set to HTTPS.
    • Data feed configuration: For Sitemap endpoint, enter the URL where your sitemap is hosted.
    • Server configuration: Enter the Username and Password that you configured for basic authentication.
  3. Click Create Feed.
  4. After one to two hours, check if batch ingestion fetches your feed files.

Example paths

The following table contains example paths for each of the hosting options:

Amazon S3 Google Cloud Storage HTTPS with a sitemap
Path s3://foorestaurant-google-feed-sandbox/ gs://foorestaurant-google-feed-sandbox/ https://sandbox-foorestaurant.com/sitemap.xml
Marker file s3://foorestaurant-google-feed-sandbox/marker.txt gs://foorestaurant-google-feed-sandbox/marker.txt N/A

Sitemaps for HTTPS hosting

Use the following guidelines when you define sitemaps:

  • Links in your sitemap must point to the files themselves.
  • If your sitemap includes references to a cloud provider instead of your own domain name, ensure that the start of the URLs, like https://www.yourcloudprovider.com/your_id, are stable and unique to your batch job.
  • Ensure that the paths to the files referenced in the sitemap don't change. For example, don't have your sitemap reference https://www.yourcloudprovider.com/your_id/10000.json today but then reference https://www.yourcloudprovider.com/your_id/20000.json tomorrow.
Example sitemap

Here's an example sitemap.xml file that serves data feed files:

Example 1: Entities grouped by merchants (Recommended).

XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
   <loc>https://your_fulfillment_url.com/restaurant_1.ndjson</loc>
   <lastmod>2018-06-11T10:46:43+05:30</lastmod>
 </url>
 <url>
   <loc>https://your_fulfillment_url.com/restaurant_2.ndjson</loc>
   <lastmod>2018-06-11T10:46:43+05:30</lastmod>
 </url>
 <url>
   <loc>https://your_fulfillment_url.com/restaurant_3.ndjson</loc>
   <lastmod>2018-06-11T10:46:43+05:30</lastmod>
 </url>
</urlset>

Example 2: Entities grouped by types.

XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
   <loc>https://your_fulfillment_url.com/restaurant.json</loc>
   <lastmod>2018-06-11T10:46:43+05:30</lastmod>
 </url>
 <url>
   <loc>https://your_fulfillment_url.com/menu.json</loc>
   <lastmod>2018-06-11T10:46:43+05:30</lastmod>
 </url>
 <url>
   <loc>https://your_fulfillment_url.com/service.json</loc>
   <lastmod>2018-06-11T10:46:43+05:30</lastmod>
 </url>
</urlset>

Update your data feeds

After your data feeds are connected, Google checks for updates once each hour, but we only ingest all data feeds when the marker.txt or sitemap.xml files have been modified. We expect that you update your data feeds once a day to prevent stale inventory.

To specify that the data feeds have been modified and are ready for batch ingestion, update the timestamp in the marker.txt file or the last-modified response header of the sitemap.xml file. Google uses these values to determine how fresh a data feed is.