Creating Programmable Search Engine with configuration files

This page introduces the basic concepts behind Programmable Search Engine configuration files.

  1. Overview
  2. What's in a Programmable Search Engine
  3. How the Components Work Together
  4. Creating a Search Engine
  5. Editing the Programmable Search Engine Files
  6. Choosing the Right Format

Overview

If the Control Panel does not give you the level of customization that you need, consider using the Programmable Search XML format, which gives you more control, flexibility, and access to more powerful features.

To use the Programmable Search Engine configuration files, start by creating a basic search engine using the Programmable Search Engine Control Panel. Once you've created your search engine, you can download your annotations and context XML files from the Overview page of the Control Panel.

XML Basics

Extensible Markup Language or XML is a general-purpose markup language. It is text with tags that you can read. For example, the Programmable Search XML format includes the following tags: <Context> </Context> and <LookAndFeel> </LookAndFeel>.

As with any XML file, your Programmable Search Engine specifications must follow XML syntax (<element attribute="value">content</element>) and be well-formed. XML has the following rules:

  • XML requires you to precede your top-level tags with an XML declaration (<?xml version="1.0"?>), but the Programmable Search Engine configuration file doesn't require it.
  • All your elements must have an opening tag (<tag>) and a closing tag (</tag>).
  • All your tags must be properly nested. You cannot have XML code that looks like: <sandwich><filling> peanut butter</sandwich></filling>. Instead, it should be like: <sandwich><filling> peanut butter</filling></sandwich>.
  • XML is case-sensitive, so carefully follow the capitalization and spelling of the tags in the instructions.
  • All attribute values must be enclosed in double quotation marks (<element attribute="value">).
  • All attributes must be defined in the opening tag (<element attribute="value">), not the closing tag ( </element>).

You can write notes for yourself using comment tags (<!-- your comment here -->), and Programmable Search Engine will not parse that line of text as XML code. Apart from writing reminders or description, you can use comments to temporarily put some XML code out of commission (perhaps because you want to experiment with certain effects or you want to troubleshoot issues). However, these comments are not preserved in the files that you download from the Control Panel. If you want to keep the comments, you should keep a copy of your commented XML files even after you upload them to the Control Panel.

You can use a simple text editor to create and edit XML files. Just save the text file with the file extension .xml (for example, cse_badminton.xml).

Back to top

What's in a Programmable Search Engine

A Programmable Search Engine has two main components, each of which is controlled by an XML file:

  • Context: The context XML file describes the basic features of a search engine. It specifies the global settings of the search engine, such as whether Image Search or promotions are enabled. Each search engine has its own context file. More information about the context XML file. For more information about selecting the most appropriate file format for your search engine, see Choosing the Right Format for Your Search Engine.
  • Annotations: The annotations XML file lists the webpages or websites you want your search engine to cover, and indicates any preferences you have about how these sites should be ranked in your search results. Each site and its associated information is called an annotation. More information about the annotations XML file.

We don't recommend that you create either of these files from scratch. Instead, download them from the Overview page of the Control Panel.

In addition to these main components, a search engine can also have the following auxiliary files:

  • Promotions: The promotions XML file lists a series of custom results that are triggered by a pre-defined set of query terms. When a user types a search that exactly matches one of your query terms, the promotion appears at the top of the page. You can use promotions to directly answer the queries of your users, lead them to important information, or point them to webpages that are not at the top of the results page yet are especially relevant. In the Control Panel, promotions are defined in the Promotions tab. More information about promotions.
  • Synonyms: The synonyms XML file expands the queries of your users to include variants of the search term. For example, if your user searches for "simian," the search engine also searches for "monkey" and "ape." In the Control Panel, synonyms are defined in the Synonyms tab. More information about synonyms.

How the Components Work Together

The context XML file doesn't specify the annotations file to use, and the annotations XML makes no reference to the context file. Programmable Search Engine uses labels to associate context and annotations. The context XML file includes labels that identify the search engine, and each annotation listed in the annotations XML is tagged with one or more labels identifying the search engine(s) to which it belongs.If you change the name of the label in the context file, you have to change all the annotations that have been tagged with that label.

Although you can upload multiple annotations files, when you download them through the Control Panel, Programmable Search Engine merges all your annotations files into a single annotations file. The annotations files provide the flexibility to customize the same site for various search engines. For example, one search engine could restrict its search to some sites, another could eliminate those sites, and yet another could promote those sites.

context.xml

Here's an example of a context.xml file containing labels identifying the search engine to which it applies:

<BackgroundLabels>
  <Label name="_include_" mode="FILTER"/>
  <Label name="_exclude_" mode="ELIMINATE"/>
<BackgroundLabels>

annotations.xml

Here's an example of an annotations file showing how each site (annotation) is associated with a label:

<Annotation about="code.google.com/*" score="1">
  <Label name="_include_"/>
</Annotation>

Back to top

Creating Advanced Programmable Search Engines

Creating advanced engines involves the following steps:

  1. Determine the format that is appropriate for your needs.
  2. Define the specifications for your search engine.
  3. Tell Programmable Search Engine which sites to search.
  4. Tell Programmable Search Engine how to rank the search results.

Editing the Programmable Search Engine Files

To work on an XML file, download the XML specification from Overview page of the Control Panel. Don't start a file from scratch. Do the following:

  1. Download the context file or annotations file from the Overview page of the Control Panel. Click the Download button in the Search Features section.
  2. Use a text editor that can handle UNIX-style line endings (WordPad, Emacs, and TextMate works; NotePad doesn't). It does not matter what you name the file, so long as you save it with the file extension .xml (for example, cx_global.xml)
  3. Make a backup copy of the downloaded file in case your edited version does not work as expected, and you have to revert to the previous version.

    If you do not make a copy and the version that you edited does not work properly, you will need to debug your file or recreate your search engine all over again. Not fun.

  4. Edit the XML file and save it. Make sure that your text editor is saving the file as a Unicode text document and not some other file format.
  5. Upload the file under the Search Features section in the Overview page.

Choosing the Right Format

Before you start creating your Programmable Search Engine, determine which format best suits your needs. You don't want to select a format that is more powerful and complex than what you need, nor do you want to use one that you will quickly outgrow.

Use the following table to pick the appropriate format.

To create Use Because Limitations More information
One or few search engines with a small number of sites Control Panel You can quickly create your Programmable Search Engine by filling out text boxes instead of creating files with a text editor and uploading the files. The Control Panel is mostly useful for familiarizing yourself with Programmable Search Engine and creating search engines with few sites. Getting Started
Complex search engines that use lots of sites, use feeds Context file and annotations files

The Programmable Search Engine files give you a greater level of control over your search engines, and make the tasks of defining and managing sites a lot easier.

Even though you plan to create your search engine using context and annotations files, it's still a good idea to familiarize yourself with the Control Panel.

The more you customize your search engine, the more complex it becomes. You have to learn the Programmable Search Elements and attributes, which are not hard to pick up, but they do require you to invest some time.

You will have to read the rest of the developer guide, which is not the most exciting reading material, unfortunately.

Context: Defining a Search Engine Specifications and Annotations: Selecting Sites

Back to top