Custom Search

Providing Structured Data

This page shows you how to add the structured data that search operators depend on.

Web pages are often filled with free form text, which is easy for humans to read but more difficult for computers to understand. Some web pages have information with greater structure that is easy to read, such as a page date embedded in the URL or title of the page, or machine-readable fields embedded in the HTML code. Google extracts a variety of structured data from web pages. This page describes the structured data types Google extracts that are available for use in Custom Snippets and Structured Search.

Contents

This page includes the following sections:

Overview

When you are reading a webpage that sells a DVD, you can quickly figure out what the title is, what reviewers thought of the film, and how they rated it. But a computer cannot do the same things, because it doesn't understand how the information is structured.

For example, if the page has content about the DVD—along with recommendations for other items, ads from other stores, and comments from customers—then the page might have different prices for various things, not just for the DVD that is being sold. You can easily figure out the price for the DVD while dismissing the other prices, but the computer can't. Some sophisticated programs might find the prices in the webpage, but they cannot determine the rules for finding just the price of the DVD.

Structured data formats are rules that standardize the structure and content of the webpage. They are markup that you apply to text snippets so that computers can process their meaning or semantics. The markup does not change the formatting of your website, it just makes the metadata and text enclosed within the XHTML tags more meaningful to computers.

Custom Search recognizes the following formats:

  • PageMaps: invisible blocks of XML that add metadata to pages.
  • Microformats: tags used to mark up visible page content along predefined types.
  • RDFa: an alternate standard for marking up visible page content along arbitrary types.
  • Microdata: a new HTML5 standard for marking up visible page content.
  • <meta> tags: standard HTML tags, a subset of which are parsed by Google.
  • Page Date: features on a page indicating its date, which Google attempts to parse

You can use one or a combination of formats that you prefer. Note that unlike Custom Search, Google Search does not use PageMaps or <meta> tags when generating rich snippets. Google Search does consider information such as microformats, microdata, RDFa, and the page date when it is generating snippet, but it has its own algorithm and policies for determining what information gets shown to users. So while structured data you add to your pages can be presented on Custom Search, it might not be displayed in Google Search results.

The following includes an idealized snippet of plain HTML from a review site:

<div>
    <div>
        <h1>Pizza My Heart</h1>
    </div>
    <span>88%</span> like it
    <a href="#reviews">See all 12 reviews</a>
    <span>Under $10 per entree</span>
<div>

The following snippet shows the previous HTML code extended with a format called microformats:

<div class="hreview-aggregate">
    <div class="vcard item">
        <h1 class="fn">Pizza My Heart</h1>
    </div>
    <span class="rating average">88%</span> like it
    <a href="#reviews">See all <span class="count">12</span> reviews</a>
    <span class="pricerange">Under $10 per entree</span>
<div>

The Rich Snippets Testing Tool shows the information Google Search extracts from this page:

hreview-aggregate
  item hcard
    fn = Pizza My Heart
rating
    average (normalized to 5.0 scale) = 4.5
    average = 88%
pricerange = Under $10 per entree
count = 12

Custom Search uses a subset of the information available for Google Search; this subset is shown at the bottom of the testing tool page:

review (source = MICROFORMAT)
ratingstars = 4.5
ratingcount = 12
pricerange = Under $10 per entree

By incorporating standard structured data formats into your webpages, you not only make the data available to Custom Search, but also for any service or tool that supports the same standard. Apply structured data to the most important information in the webpage, so you can present them directly in the results. For example, if you have a website selling Android devices, include structured data about the ratings, prices, availability, and whatnot. When your users search for the Android devices, they can see the ratings, prices, and availability at a glance.

So computers can now understand the types of data in the webpage. Now what? Computers can also start doing the menial task of finding and combining information in different webpages. This frees users from totally boring tasks, such as sifting through multiple pages to find items that they want. Search engines, such as Custom Search, can process the structured data in your webpages and display it in useful, more meaningful ways, such as custom snippets and structured search.

Back to top

Providing Data to Custom Search

Google supports several kinds of data which are used primarily by Custom Search: Pagemaps, a subset of <meta> tags, and approximate page dates.

Using PageMaps

PageMaps is a structured data format that provides Google with information about the data on a page. It enables website creators to embed data and notes in webpages. Although the structured data is not visible to your users or to Google Web Search, Custom Search recognizes it when indexing your webpages and returns it directly in XML results or in JSON format in the Custom Search element.

You can explicitly add PageMaps to a page, or submit PageMaps using a Sitemap. Google will also use other information on a page, such as rich snippets markup or meta tag data, to create a PageMap.

Unlike the other structured data formats described below, PageMaps does not require you to follow standard properties or terms, or even refer to an existing vocabulary, schema, or template. You can just create custom attribute values that make sense for your website. Unlike the structured data attributes of microformats, microdata and RDFa, which are added around user-visible content in the body of the HTML, PageMaps metadata is included in the head section of the HTML page. This method supports arbitrary data which may be needed by your application but which you might not want to display to users. (If you don't want PageMap information returned in your XML, you can keep it private using an AccessKey.)

Once you create a PageMap, you can submit it to Google using any of the following methods:

PageMap tag definitions

The following table outlines the requirements for adding PageMap data to a Sitemap.

Tag Required? Description
PageMap Yes Encloses all PageMap information for the relevant URL.
DataObject Yes Encloses all information about a single element (for example, an action).
Attribute Yes Each DataObject contains one or more attributes.

Note: PageMaps are XML blocks and therefore must be formatted correctly; in particular, the PageMap, DataObject and Attribute tags in the XML are case sensitive, as are the type, name, and value attributes.

Add PageMap data directly to your HTML page

Here's an example of PageMap data for a webpage about badminton:

<html>
  <head>
   ...
  <!--
  <PageMap>
     <DataObject type="document">
        <Attribute name="title">The Biomechanics of a Badminton
        Smash</Attribute>
        <Attribute name="author">Avelino T. Lim</Attribute>
        <Attribute name="description">The smash is the most
        explosive and aggressive stroke in Badminton. Elite athletes can
        generate shuttlecock velocities of up to 370 km/h. To perform the
        stroke, one must understand the biomechanics involved, from the body
        positioning to the wrist flexion. </Attribute>
        <Attribute name="page_count">25</Attribute>
        <Attribute name="rating">4.5</Attribute>
        <Attribute name="last_update">05/05/2009</Attribute>
     </DataObject>
     <DataObject type="thumbnail">
        <Attribute name="src" value="http://www.example.com/papers/sic.png" />
        <Attribute name="width" value="627" />
        <Attribute name="height" value="167" />
     </DataObject>
  </PageMap>
  -->
  </head>
   ...
</html>

Add PageMap data to a Sitemap

If you don't want to include PageMap data in the HTML of your pages, you can add PageMap data to a Sitemap and submit that Sitemap for on-demand indexing using the Custom Search Control Panel.

Here's an example of a Sitemap that includes PageMap information for two URLs: http://www.example.com/foo and http://www.example.com/bar.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
       xmlns:content="http://www.google.com/schemas/sitemap-content/1.0">
 <url>
   <loc>http://www.example.com/foo</loc>
   <PageMap xmlns="http://www.google.com/schemas/sitemap-pagemap/1.0">
     <DataObject type="document" id="hibachi">
       <Attribute name="name">Dragon</Attribute>
       <Attribute name="review">3.5</Attribute>
     </DataObject>
   </PageMap>
 </url>
 <url>
   <loc>http://www.example.com/bar</loc>
   <PageMap xmlns="http://www.google.com/schemas/sitemap-pagemap/1.0">
     <DataObject type="document" id="biggreenegg">
       <Attribute name="name">Ribs</Attribute>
       <Attribute name="review">4.0</Attribute>
     </DataObject>
   </PageMap>
 </url>
</urlset>

Submit PageMap data using the Custom Search Control API

To submit PageMap data using the Custom Search Control API, send an HTTP POST message, using the text/xml content type, to: http://www.google.com/cse/api/default/index/<CSE_ID>. Include PageMap data in the message body, like this:

<OnDemandIndex>
<Pages>
  <Page url="http://www.example,com/monkeys/">
    <pagemap>
      <DataObject type="document">
        <Attribute name="title">Monkeys</Attribute>
        <Attribute name="review">4.0/5.0</Attribute>
      </DataObject>
      <DataObject type="stats">
        <Attribute name="installs">1000</Attribute>
        <Attribute name="comments">100</Attribute>
      </DataObject>
    </pagemap>
  </Page>
  <Page url="http://www.example.com/parrots">
    <pagemap>
      <DataObject type="document">
        <Attribute name="title">Parrots</Attribute>
        <Attribute name="review">4.5/5.0</Attribute>
      </DataObject>
      <DataObject type="stats">
        <Attribute name="installs">2000</Attribute>
        <Attribute name="comments">200</Attribute>
      </DataObject>
    </pagemap>
  </Page>
</Pages>
</OnDemandIndex>

Private PageMaps

In some cases, you may not want custom attributes returned in your search engine's query results XML, because those are publicly visible by default. In this case, you can create a private PageMap by adding an AccessKey to the DataObject you want to protect, and sending the PageMap directly to Google using the on-demand indexing API. Only web searches with a matching AccessKey parameter will get that DataObject in results.

Here's an example of an AccessKey in use:

  <Page url="http://www.example.com/monkeys/">
    <pagemap>
      <DataObject type="stats">
        <AccessKey>myprivate12345</AccessKey>
        <Attribute name="installs">1000</Attribute>
        <Attribute name="comments">100</Attribute>
      </DataObject>
    </pagemap>
  </page>

An AccessKey can consist of no more than 30 alphanumeric characters.

To retrieve protected data, specify the AccessKey in the pgmpk parameter. For example, if your AccessKey is myprivate12345, your query URL might look like this:

https://www.google.com/cse?cx=[CSEID]&q=animal&output=xml&pgmpk=myprivate12345

See a full list of supported parameters.

To restrict results to protected data, update your search URL to append the AccessKey value to the more:pagemap:TYPE-NAME:VALUE operator, like this:

https://www.google.com/cse?cx=[CSEID]&output=xml&q=animal+more:pagemap:myprivate12345-document-rating&pgmpk=myprivate12345

To sort by protected data, update your search URL to append the AccessKey value to the &sort=TYPE-NAME:DIRECTION parameter, like this:

https://www.google.com/cse?cx=[CSEID]&q=animal&output=xml&sort=myprivate12345-document-rating&pgmpk=myprivate12345

Parsing PageMap data

If you are getting results back via XML, then the custom attributes are returned in the results within the PageMap tag, as shown below. You can parse the DataObjects within the PageMap tag and provide customized presentation of the relevant attributes. If you are using the Custom Search element, then the custom attributes are returned in the richSnippet property of each result for use in data templates, as described at Rich Snippet result properties.

<r n="1">
 <u> http://www.xyz.com/business/vending_machine.html </u>
 ...
 <t> In Italy, a Vending Machine Even Makes the <b>Pizza</b> </t>
 ...
 <s>The European vending machine industry has annual sales of about #33
 billion, much of it coming from factories and offices.</s>
 ...
 <PageMap>
  <DataObject type="image">
   <Attribute name="image_src" value="http://www.nytimes.com/images/2009/03/14/business/14vend.751.jpg"/>
  </DataObject>
  <DataObject type="publication">
   <Attribute name="author" value="John Tagliabue"/>
   <Attribute name="date" value="March 14, 2009"/>
   <Attribute name="category" value="Business/World Business"/>
  </DataObject>
 </PageMap>
 ...
</r>

Back to top

Using <meta> tags

While PageMaps allow you to precisely specify the data you want for each page, sometimes you have a large amount of content which you do not want to annotate. Google extracts selected content from META tags of the form <meta name="KEY" content="VALUE">. We do not support variants of the META tag, such as the use of property instead of name.

While we explicitly exclude common tags that are usually inserted programmatically by web authoring tools, such as robots, description, and keywords, rarer tags specific to your site will be extracted and put into a special data object of type metatags, which can be used with all of Custom Search's structured data features. For example, a <meta> tag of the form:

<meta name="pubdate" content="20100101">

creates a PageMap DataObject which is returned in XML results like this:

<r n="1">
 ...
 <PageMap>
  <DataObject type="metatags">
   <Attribute name="pubdate" value="20100101"/>
  </DataObject>
 </PageMap>
 ...
</r>

The data in this automatically created PageMap can be used anywhere you can use data from a PageMap explicitly included in your page's content. For instance, it can be used with structured search operators like Sort by Attribute:

https://www.google.com/cse?cx=12345:example&q=oil+spill&sort=metatags-pubdate

or with the Custom Search element:

...
var options = {};
options[google.search.Search.RESTRICT_EXTENDED_ARGS] = {'sort': 'metatags-pubdate:d:s'};
customSearchControl = new google.search.CustomSearchControl('000525776413497593842:aooj-2z_jjm', options);
...

The <meta> tags excluded by Google include:

  • robots
  • description
  • keywords
  • revisit-after
  • generator
  • verify-v1
  • googlebot
  • google-site-verification
  • mssmarttagspreventparsing
  • no-cache

Google attempts to include all other <meta> tags, with the caveat that punctuation, special characters and embedded spaces in the name field of <meta> tags may not be parsed correctly. Custom Search explicitly supports periods and dashes in <meta> tag names. Custom Search does not explicitly support other special characters within <meta> tag names, but some special characters may be accepted correctly if they are URL encoded.

Limitations

Custom Search will convert up to 50 <meta> tags to PageMaps, as long as the total text size of all processed properties does not exceed 1MB, with no individual property exceeding 1024 characters.

Back to top

Using Page Dates

In addition to metadata which you explicitly specify on a page, Google also estimates a page date based on features of the page such as dates in the title and URL. Custom Search allows you to use this date to sort, bias and range restrict results by using a special metadata key of date. This estimated date can be used in all operators that use the &sort= URL parameter, including Sort by Attribute, Bias by Attribute, Restrict to Range.

Note: The page date is not added to the PageMap, so it is not returned in XML results, cannot be used in the Custom Search element, and cannot be used with the Filter by Attribute feature.

The following examples show the use of the page date with these operators:

If you want to... Send this URL... To learn more see...
Sort results by date in descending order https://www.google.com/cse?cx=12345:example&q=oil+spill&sort=date Sort by Attribute
Bias results strongly towards newer dates https://www.google.com/cse?cx=12345:example&q=oil+spill&sort=date:d:s Bias by Attribute
Bias results weakly towards older dates https://www.google.com/cse?cx=12345:example&q=oil+spill&sort=date:a:w Bias by Attribute
Return results from January 1 to February 1 of 2010 (inclusive) https://www.google.com/cse?cx=12345:example&q=oil+spill&sort=date:r:20100101:20100201 Restrict to Range

Google's estimate of the right date for a page is based on features such as the byline date of news articles or an explicitly specified date in the title of the document. If a page has poorly specified or inconsistent dates Google's estimate of the page date may not make sense, and your custom search engine may return results ordered in a way you do not expect.

Formatting Dates

A site may provide date information implicitly, relying on Google's estimated page date feature to detect dates embedded in the page URL, title or other features, or explicitly, by supplying a date in a structured data format. In either case, effective use of dates requires formatting the dates correctly.

For Custom Search's Sort by Attribute, Bias by Attribute, Restrict to Range features, Google attempts to parse dates using both conventional date formatting and formal standards such as ISO 8601 and IETF RFC 850. The following complete date formats are accepted:

Date Format Example Date
YYYY-MM-DD 2009-12-31
YYYY/MM/DD 2009/12/31
YYYYMMDD 20091231
Month DD YYYY December 31 2009
DD Month YYYY 31 December 2009

Google will attempt to parse variants of these date formats, such as MM/DD/YYYY and DD/MM/YYYY. However, the more ambiguous the date, the less likely that Google will parse it correctly. For example, the date 06/07/08 is extremely ambiguous and it is unlikely Google will assign to it the interpretation you want. For best results, use a complete ISO 8601 date format with a fully specified year.

Back to top

Rich Snippets

Google also extracts a variety of structured data from Microformats, RDFa and Microdata to be used in Rich Snippets, extended presentations of standard Google search results. A subset of this data is available for use in Custom Search's structured data operators—typically, the same data used in Rich Snippets. For example, if you have marked up your pages with the Microformat hrecipe standard, you could sort on the number of rating stars of the recipe with an operator like &sort=recipe-ratingstars. Google is continually extending the data it extracts and how much of this data is available for use in Custom Search; to see what data we currently extract, you can use the Rich Snippets Preview Tool in Webmaster tools.

Back to top

Using Microformats

Microformats is a specification for representing commonly published items such as reviews, people, products, and businesses. Generally, microformats consist of <span> and <div> elements and a class property, along with a brief and descriptive property name (such as dtreviewed or rating, which represent the date an item was reviewed and its rating, respectively).

The following includes a snippet of plain HTML code.

<p><strong>Kevin Grendelzilla</strong></p>
<p>Technical writer at Google</p>
<p>555 Search Parkway</p>
<p>Googlelandia, CA 94043</p>

The following snippet shows the previous HTML code extended with microformats:

<div class="vcard">
   <p><strong class="fn">Kevin Grendelzilla</strong></p>
   <p><span class="title">Technical writer</span> at <span class="org">Google</span></p>
   <p><span class="adr">
      <span class="street-address">555 Search Parkway</span>
      <span class="locality">Googlelandia</span>, <span class="region">CA</span>
      <span class="postcode">94043</span>
      </span></p>
</div>

Google extracts a subset of this data, normalized and reorganized to correspond to how it would be displayed in Rich Snippets. This subset would be returned in XML results like this:

<r n="1">
 ...
 <PageMap>
  <DataObject type="person">
   <Attribute name="location" value="Googlelandia"/>
   <Attribute name="role" value="Technical Writer"/>
  </DataObject>
 </PageMap>
 ...
</r>

To see what Google extracts for a page, use the Rich Snippets Testing Tool in Google's Webmaster Tools site. The data Google extracts from pages is continually being extended, so check back periodically to see if the data you want has been made available. In the meantime, if you need custom data that does not correspond to a defined microformat, you can use PageMaps.

To learn more about microformats, see the Webmaster Tools article and microformats.org.

Back to top

Using Resource Description Framework in Attributes (RDFa)

Resource Description Framework in attributes (RDFa) is more flexible than microformats. Microformats specify both a syntax for including structured data into HTML documents and set of microformat classes each with its own specific vocabulary of allowed attributes. RDFa, on the other hand, specifies only a syntax and allows you to use existing vocabularies of attributes or create your own. It even lets you combine multiple vocabularies freely. If the existing vocabularies do not meet your needs, you can define your own standards and vocabularies by creating new fields.

The following includes a snippet of plain HTML code.

<div>
   <h3>5 Centimeters Per Second</h3>
   <h4>Makoto Shinkai</h4>
    ...
</div>

The following snippet shows the previous HTML code extended with RDFa:

<div>
   <h3 property="dc:title">5 Centimeters Per Second</h3>
   <h4 property="dc:maker">Makoto Shinkai</h4>
   ...
</div>

To learn more about RDFa, see the Webmaster Tools article. To learn more about defining an RDF schema, see the RDF Primer.

Back to top

Using Microdata

HTML5, the latest revision of the language web pages are written in, defines a format called microdata that incorporates the ideas of RDFa and Microformats directly into the HTML standard itself. Microdata uses simple attributes in HTML tags (often span or div) to assign brief and descriptive names to items and properties.

Like RDFa and Microformats, Microdata's attributes help you specify that your content describes information of specific types, like reviews, people, information or events. For example, an person can have the properties name, nickname, url, title and affiliation. The following is an example of a short HTML block showing this basic contact information for Bob Smith:

<div>
  My name is Bob Smith but people call me Smithy. Here is my home page:
  <a href="http://www.example.com">www.example.com</a>
  I live in Albuquerque, NM and work as an engineer at ACME Corp.
</div>

The following is the same HTML marked up with microdata. Note that in this example we use a property 'nickname' that is not yet officially part of schema.org. Custom Search is a good way to explore possible schema.org extensions locally before proposing them to the wider community.

<div itemscope itemtype="http://schema.org/Person">
  My name is <span itemprop="name">Bob Smith</span>
  but people call me <span itemprop="nickname">Smithy</span>.
  Here is my home page:
  <a href="http://www.example.com" itemprop="url">www.example.com</a>
  I live in Albuquerque, NM and work as an <span itemprop="title">engineer</span>
  at <span itemprop="affiliation">ACME Corp</span>.
</div>

The first line of this example includes a HTML div tag with an itemscope attribute that indicates that div contains a microdata item. The itemtype="http://schema.org/Person" attribute on the same tage tells us this is a person. Each property of the person item is identified with the itemprop attribute; for example, itemprop="name" on the span tag describes the person's name. Note that you are not limited to span and div; the itemprop="url" tag is attached to an a (anchor) tag.

To learn more about microdata, see the Webmaster Tools article and the HTML Microdata standard.

Back to top

Viewing Extracted Structured Data

After you have tagged your webpages with structured data, you can use the Rich Snippets Testing Tool to view the structured data that can be extracted from the webpage. The tool provides two views: the first view shows the structured data that Google Search can extract from the page, while the second view shows what Custom Search can extract from the page.

If you haven't tagged any of your webpages but would like to see what extracted structured data might look like, you can enter the URL of other websites. Popular sites that have review information or list of contacts are more likely to have structured data. If you see result snippets on Google search that looks similar to Figure 1, you can conclude that the webpage has structured data.

Figure 1: Result snippet with rating, price range, and review.

Once you have found a page with structured data, you can view that page's source to see the structured data that site has implmented, or view that page in the Rich Snippets Testing Tool to see what data is extracted for Google Search rich snippets and Custom Search structured search. For example, consider the following snippet of HTML with structured data about a person implemented as microformats:

<div class="vcard">
    <h1 class="fn">
      <span class="given-name">Godzilla</span>
      <span class="family-name">Gigantis</span>
    </h1>
    <span class="title">Senior Giant Monster</span>,
    <span class="adr">
      <span class="locality">Tokyo</span>
    </span>
<div>

From a page with this markup, Google extracts the following data for use in rich snippets:

hcard
  fn = Godzilla Gigantis
  n
    family-name = Gigantis
    given-name = Godzilla
  adr
    locality = Tokyo
  title = Senior Giant Monster  

Custom Search extracts the following subset of that data for use in structured search:

person (source = MICROFORMAT)
  location = Tokyo

Thus, this tool allows you to view not only the Rich Snippets markup recognized for Google Search, but also the additional customized markup that we support in Custom Search. You can immediately see how your web page would be processed during indexing, and what metadata attributes would be returned in PageMaps in your Custom Search results. If there are any errors in your markup, you can fix them right away. Remember, you need to add the &view=cse parameter to the URL or click the checkbox to review the additional metadata extracted by Custom Search.

Back to top

Exploring Other Features

Structured data can be used in several Custom Search features including the following:

If you want to write client applications that dynamically create custom search engines using HTTP request methods, see Programmatically Creating Custom Search Engines.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.