Linked Custom Search Engine

In a Linked CSE the specification of your search engine is hosted on your website instead of Google Custom Search Control Panel.

Note: Linked CSE functionality will stop working starting in April 2017. We recommend that you migrate your settings into the control panel

Overview

When you create a CSE in the Control Panel, the configuration files defining your engine - context and annotations xml files - are stored at Google and are accessible via the Setup > Advanced tab. To change any aspect of the CSE, you have to either use the Control Panel or upload the new XML specification. This imposes several limitations:

  • Creating and maintaining a CSE is a manual process.
  • It is difficult to create a very large number of CSEs, say one for each of your users or a slightly different one for each of your pages.
  • It is difficult to use other data sources such as iCal, RSS etc. to programmatically create CSEs.

Linked CSE helps you overcome these limitations. With Linked CSEs, you host the CSE specification on your website and include the url for this specification in your CSE search request via the cref parameter. Google retrieves the CSE specification from your website when your user searches in the CSE. This has several benefits:

  • You can easily convert your data source to a Custom Search Engine.
  • You can automatically generate any number of CSEs, each possibly tuned to a particular user, the particular page, time of day, etc. In fact, you can generate CSEs on demand, in response to a users query or a page on your site that your user is searching from. We provide several interesting tools, such as creating a Linked CSE out of the links on a page, that you can use.
  • You can easily update your Linked CSE definitions without pushing data to Google.
  • There are no global, per user annotation limits.

You can now exploit the full power of your ideas to dynamically generate CSEs. Some interesting sources of data you could use to create CSEs are iCal feeds, your referrer logs and your users' bookmarks or browsing history. You could even change the look and feel of your CSE in response to the health or traffic of your website.

Linked CSEs are always free, ad-supported CSEs; the Linked CSE mechanism cannot be used to host CSE specifications for Google Site Search.

Example of a Linked CSE

Here is a simple example of a Linked CSE, which is hosted at cse-labs.appspot.com/cref_cse.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<GoogleCustomizations>
  <CustomSearchEngine>
      <Title>Solar Energy</Title>
      <Description>A Google Custom Search Engine  about solar energy</Description>
      <Context>
         <BackgroundLabels>
           <Label name="solar_example" mode="FILTER" />
        </BackgroundLabels>
      </Context>
    </CustomSearchEngine>
   <Annotations>
    <Annotation about="http://www.solarenergy.org/*">
      <Label name="solar_example"/>
    </Annotation>
   <Annotation about="http://www.solarfacts.net/*">
       <Label name="solar_example"/>
   </Annotation>
  </Annotations>
</GoogleCustomizations>

You can access the homepage of this CSE by pointing to the configutation file location using the cref param:

http://cse.google.com/home?cref={YOUR_CONFIG_FILE_URL}

Try it

The Linked CSE definition is wrapped in a GoogleCustomizations tag and consist of 2 parts:

  • CustomSearchEngine - this is an equivalent of the context file from Control Panel
    • describes the basic features of a search engine, like title or look and feel options
  • Annotations - this is an equivalent of the annotations file from Control Panel
    • lists the webpages or websites you want your search engine to cover

The important part that binds sites in the annotations section to your search engine is the BackgroundLabels tag the CustomSearchEngine definition. In the above example, there is only one filter label: solar_example. The sites in the annotations section need to have the same label to be included in this particular search engine. You can read more about using labels in the ranking chapter.

To learn more about configurations options available in each section please refer to the configuration files documentation.

Implementing the Linked CSE searchbox on your site

You can use the standard Custom Search Element to implement the searchbox and/or search results on your page. The only difference between Google-hosted CSE and Linked CSE code is that you need to provide cref param pointing to the configuration URL instead of the standard 'cx' engine id.

<script>
  (function() {
    var cref = 'http://www.guha.com/cref_cse.xml';
    var gcse = document.createElement('script'); gcse.type = 'text/javascript'; gcse.async = true;
    gcse.src = 'https://cse.google.com/cse.js?cref=' + cref;
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(gcse, s);
  })();
</script>
<gcse:search></gcse:search>

Try it

Similarly, you can access the results via JSON API by providing the cref param in your request.

Note that this search box does not have to be on the same site as the CSE specification file.

Available tools

Developer Console: Test your engine

Linked CSE Console

The developer console allows Linked CSE authors to get instant feedback about their XML definition and annotations files. After an XML file's URL is entered into the input field and the "Refresh" button is pressed, the file (and any files it depends on) will be scaned. Errors found in any of the scanned files will be reported.

If an error-free XML search engine definition is submitted to the developer console, it will replace the cached version of that file (if a cached version exists). The next time this Linked CSE is accessed using the URL provided in the developer console, it should reflect this new version.

The makeannotations tool can be accessed by the url formatted as follows:

https://cse.google.com/tools/makeannotations?{parameters}

Try it

The tool scans through a webpage specified in the url param to make a list of all anchors, i.e. all <a href=...> elements. The href attributes of the anchors are converted to XML annotations. Alternately, makeannotations can be used with an RSS, Atom or OPML feed. Either way, the output will be an XML stand-alone annotations file that can be later included in the Linked CSE configuration file. The exact behavior is determined by the options below, many of which are shared with the makecse tool.

Parameter Description
url Required. Specifies the page or feed from which the urls should be extracted. The tool only extracts URLs from anchors tags, RSS feeds, and Atom feeds. It will ignore other URLs, for example URLs in javascript. The tool extracts links only from http pages, so the protocol can be omitted. The value of this parameter should be URL-escaped, meaning that some characters are replaced with escaped versions of these characters. Some common substitution are listed below:
  • / → %2F
  • ? → %3F
  • = → %3D
  • & → %26
label Required. Associates the extracted urls with a given label (e.g. so that they can be included or excluded from the engine with the same label).
pattern Optional. Controls how the extracted URL is converted to an CSE url pattern for the annotations about attribute. The default value is path. Allowed values:
  • exact = the entire URL is used to create an exact url pattern: about="www.ex.com/some/path/file.html"
  • path - the portion of the URL before the last forward slash (/) is extracted. Then a wildcard (*) is added, so that a prefix pattern is created: about="www.ex.com/some/path/*"
  • host - the portion before the first slash (/) is extracted and a wildcard (*) is added to create a prefix pattern. The hostname is also truncated to the site level and a wildcard is inserted, making the result a host pattern as well: about="*.ex.com/*"
autofilter Optional. If the value is set to 1, then annotation elements with overly-general CSE url patterns will be eliminated. For example, suppose url=google.com is used with pattern=host, and the URL blogsearch.google.com is extracted. This URL is converted to the pattern .google.com/. When autofilter=1 is used, no annotation element will be created for this pattern, since it is unlikely that you want all of Google's website in your auto-generated Linked CSE. If autofilter=0 is used, then such an annotation is permitted. The default value is 1.
startbyte, endbyte Optional. When scraping links from the web page specified by the url parameter, the makeannotations tool normally scans the entire page. If startbyte is specified and is a non-negative integer, then scanning will start this many bytes into the page. If stopbyte is specified, then scanning will stop at this position. The beginning of the web page has byte position zero.

The makecse tool can be accessed by the url formatted as follows:

https://cse.google.com/tools/makecse?{parameters}

Try it

The makecse tool emits a simple Custom Search Engine definition which includes annotations created by the makeannotations tool (described above). All of the makeannotations parameters are available, along with an additional parameter called boostexact:

Parameter Description
boostexact Optional. Takes values 0 and 1. If set to 1, then the search engine will extract two sets of url patterns: exact, and the type requested in the pattern parameter. The search engine will boost exact url patterns, making them more likely to appear in search results. The default value is 1.

Updating the Linked CSE specification

The first time a user issues a search query, we will fetch the CSE specification and use it to process the query. We also cache your CSE specification and periodically refresh it, so that you don't have to worry about serving CSE specification requests every time your user issues a query. If you change the specification of your Linked CSE and need it refreshed right away use the Linked CSE console.

Transitioning existing Google-stored CSE to a Linked CSE

To start storing the configuration of your existing CSE on your own web server:

  1. In the Control Panel select your CSE and go to the Advanced tab.
  2. In the CSE Annotations section, click Download (XML). Save the resulting file. We assume you call it myannos.xml.
  3. In the CSE Context section, click Download (XML). Save the resulting file. We assume you call it mycontext.xml.
  4. Put myannos.xml on your web server. How you do this varies with your hosting company and web server configuration, so please see the provider documentation if you are having problems. Let's say your annotations file is now available at http://myserver.com/user/myannos.xml.
  5. Edit mycontext.xml in a text editor:
    • Insert <GoogleCustomizations> before the first <CustomSearchEngine> tag.
    • Add </GoogleCustomizations> as the last line of the file.
    • Before the final </GoogleCustomizations> tag, insert <Include type="Annotations" href="http://myserver.com/user/myannos.xml"/>
  6. Put mycontext.xml on your web server. Let's say that it is now available at http://myserver.com/user/mycontext.xml.
  7. Update the code snippet for your search box.

File size limits

We require each configuration file to be less than 3MB in size. If you have more annotations than that, you can split them up into multiple files and use Include tags for specifying those files. You can have up to fifty files, but the total size of all the files you have included must be less than 10MB. We expect that this will allow you to include about 25K annotations per CSE.

Send feedback about...

Custom Search