This page describes how to tweak the ranking of the search results returned by your search engines.
- Boosting Results with Keywords
- Changing Search Results with Labels
- Tagging Sites with Labels
- Modulating the Effects of Labels
- Tagging TSV Annotations with Labels
Say that you've compiled a list of sites that you want your search engine to cover, but when you test out some queries, the search results do not quite match what you had in mind. The results that you think are most relevant to the query are not at the top of the page. Or perhaps you want to give preference to webpages from your favorite research institution or your own website. You can straighten that out by promoting or demoting results. Programmable Search Engine lets you tune results by three means: keywords, weighted labels, and scores. Keywords and weights are defined in the context file, while scores are defined in the annotations file.
- Keywords are a quick way of boosting certain webpages in your search results and getting more search results about a particular subject.
- Weighted labels tell Programmable Search Engine whether to exclude, promote, or demote a site. How much a site is promoted or demoted depends on the weights that you apply to the labels.
- Scores, which are applied to individual annotations, temper or reverse the influence of the weighted labels. They add another layer of granularity to the fine-tuning of the ranking.
Weights in labels and scores in annotations are the primary knobs and dials
for changing the ranking of search results. Both have values that range from
+1.0. You can promote and demote sites by
turning the dials (increasing or decreasing values) with scores and weights.
You have strong influence over the ranking, but you do not have absolute control over the results. The promotion or demotion of results is a function of many parameters, including the relevancy of the webpage, the choice of keywords, the weight on the labels, the scores in the annotations, the number of collaborators who are also contributing to the search engine, and so on.
Boosting Results with Keywords
Keywords are the quickest way to change results. Programmable Search Engine boosts webpages that include your keywords. It can also retrieve more search results about that subject. So if your search results seem paltry, try adding keywords. While Programmable Search Engine boosts webpages that contain those keywords, it does not demote or filter out webpages that don't contain the keywords.
Keywords are a way for you to apply the intent of your users to the search engine. For example, when users of the yoga search engine search for "mat", they are actually searching for "yoga mat", not "Miller Analogy Test" or "house mats". Think about the main focus of your search engine and the context of your users' search queries. In our search engine example, "yoga" would be an obvious keyword. Don't use keywords that are too broad or straddle too many categories. For example, "exercise" and "eastern practices" would retrieve many webpages that have nothing to do with yoga. The best keywords describe the content of the sites that your search engine covers.
Start out with a single word first, and see if you can get the results that
you want. If you don't get enough results, try using multiple keywords. You can
also use phrases, which are series of words enclosed within quotation marks
(for example, "yoga pose"), but single-word keywords are better. Programmable Search Engine
yoga pose stretch as three keywords, "yoga", "stretch",
Keywords are not independent from each other; they work together. So if you have the keywords "yoga" and "pose", webpages that contain "yoga" and webpages that contain "pose" get boosted, but webpages that contain both "yoga" and "pose" get boosted even more.
Let's compare search results for "mat" in two versions of a yoga custom search engine.
Figure 1: Results for the search query "mat" from a search engine that does not use keywords. (To see the entire result set, click the image.)
Figure 2: Results for the search query "mat" from a search engine with the keyword "yoga".
In the version with the "yoga" keyword, webpages that contain the keyword are promoted in the results page.
You can create as many keywords as you want, as long as you don't exceed 100 characters. The easiest way to create keywords is through the Basics tab in the Control Panel. You can use that tab to experiment, trying out different keywords and checking out their effects on the results page. If you don't like the results, you can easily remove a keyword and try another one.
If you want to create keywords in your context file, you can use the
keywords attribute of the
to define the keyword values. Separate keywords from each other using a single
space. Enclose phrases in quotation marks; you can use either the punctuation
") or the character entity (
<CustomSearchEngine keywords="asana "yoga postures""> </CustomSearchEngine>
Changing Search Results with Labels
The other way to change search results is with labels, which are the workhorses of search results ranking, determining how sites should be treated.
You can use two kinds of labels: search engine labels and refinement labels.
Search engine labels determine which sites should be covered by the search
engine. They are invisible to your users and run in the background; hence,
their parent element is called
BackgroundLabels. Refinement labels,
on the other hand, are visible to your users and show up as links. Refinements
are discussed in detail in the Refining Searches
page. Most of this page focuses on search engine labels, although
modes, weights, and
scores operate in the same way in both search engine and refinement
The following code shows the two kinds of labels in the context file:
<!--Search engine labels--> <BackgroundLabels> <Label name="_cse_hwbuiarvsbo" mode="FILTER"/> <Label name="_cse_exclude_hwbuiarvsbo" mode="ELIMINATE"/> <lt;/BackgroundLabels> <!--Refinement label--> <Facet> <FacetItem title="Lectures"> <Label name="lectures" mode="BOOST" weight="0.8"> <Rewrite>lecture OR lectures</Rewrite> </Label> </FacetItem> </Facet>
When you first create a
Programmable Search Engine using the Control Panel, Programmable Search Engine creates two
search engine labels for you. The labels have modes, which determine how the
sites should be treated. One of them is exclusive
mode="ELIMINATE"), and the other one is inclusive
mode="FILTER"). (You can change the mode for the inclusive label
from "filter" to "boost" after creating the Programmable Search Engine). You can
create multiple additional background labels for a single search engine, but you
cannot delete the two automatically-created labels.
To use search engine labels, do the following:
- In the context file, create or redefine search engine labels.
- In the annotations file, tag sites with labels.
The annotations file can be in TSV format or XML format, but the XML format gives you the highest level of control.
Example: Context File with Labels
The following is a truncated example of a context file with search engine labels.
<CustomSearchEngine keywords="climate "global warming" "greenhouse gases""> <Title>RealClimate</Title> <Description>"Climate change"</Description> <Context> <BackgroundLabels> <Label name="_cse_hwbuiarvsbo" mode="FILTER"/> <Label name="_cse_exclude_hwbuiarvsbo" mode="ELIMINATE"/> </BackgroundLabels> </Context> </CustomSearchEngine>
Defining the Mode of the Label
Whether a site is promoted, demoted, or excluded depends on the search engine label it is associated with. A search engine label can have the following modes:
Note: Follow the capitalization. Use uppercase letters for the modes.
|Mode||Does the following...||Use this mode if...|
||Excludes sites tagged with this label from your search engine.||
You want to exclude webpages that rank highly on Google search but are not that great for your audience.
For example, if you are creating a search engine for the scientific
study of hamsters, you would use labels with
||Includes only sites tagged with this label, and excludes everything else.||
You want the search engine to search only your site, affiliated sites, or sites that focus on a particular subject.
Because the coverage of such search engines is restricted to a handful of sites, you can have more precise control over the ranking of the search results. Changing the order of the search results using weights is discussed in the next section.
For example, if you want to create a search engine just for your website,
have a single site tagged with a label that has the
||Includes all websites in your search engine, but promotes or demotes sites with this label. How much a site is promoted or demoted depends on the weight you assign to it.|| You want a broad search engine that emphasizes some sites but does not
exclude other sites altogether.
For example, if you want to create a search engine with a wide coverage, but
you are partial to your own website (the best website ever!), use labels
Creating Weighted Labels
Once you have labels that include, promote, or exclude sites, you can assign
weights to the inclusive labels. Weights let you define how much a label should
promote or demote a tagged site. The values for weights can range from
+1.0. The weight range gives you fairly
refined control over sites. A positive weight in the label emphasizes sites
tagged with it, while a negative weight, de-emphasizes.
The following code shows a weighted label:
<BackgroundLabels> <Label name="_cse_hwbuiarvsbo" mode="FILTER" weight="0.65"/> <Label name="_cse_exclude_hwbuiarvsbo" mode="ELIMINATE"/> </BackgroundLabels>
The boost and filter labels that do not have defined weights, such as those
generated by Programmable Search Engine, have a default weight of
+0.7. So if
you want to strengthen the generated label's ability to promote sites, change
the value to something greater than
+0.7. If you change the value
to something lower than default, you weaken the label's boosting effect on the
ranking of the site. When you go the other way and assign a negative weight for
the label, that label will demote or suppress a site. As you approach
-1.0, it gets increasingly hard for sites to have a high ranking
in the results. At
-1.0, even a highly ranked site will have a hard
time overcoming the strong demotion.
The following table demonstrates how results are adjusted based on the mode and weight of a label.
||Gives the site a big promotion. However, it does not necessarily mean
that the tagged site will be the top result at all times, nor that other
sites will be excluded. It is not the same as setting the mode to
If you feel strongly that the sites you tag with heavily weighted labels
should be the top results at the exclusion of all other results, you should use
a filter label instead of a boost label. But if you just want a massive boost
without excluding other sites, you could use the
||Gives the site a big demotion. This is not the same as setting the mode
||Undefined||If you do not define the weight (for example,
||Gives the selected site a big promotion. When the mode is set to
||Effectively blocks the selected site from the results. It is as though you have tagged the site with an eliminate label.|
||Undefined||If you do not define the weight (for example,
||No weight||Blocks the site. Sites that match the label will not be shown. If all relevant results happen to have an eliminate label, you could have an empty results page. This is more likely to happen with filter-type search engines, not boost-type search engines.|
You can create multiple labels of varying weights, and apply them to sites as you see fit. For example, you might want to create a label that strongly promotes sites and another that mildly promotes sites. You can create as many weighted labels as you want, but after a certain point, they can become hard to manage. A better way to control the ranking of sites at a more granular level is through scores, which are discussed in the next section.
Boosting Results to the Top
If you want your favorite sites to be the ranked very highly, use the
top attribute with your filter or boost label. The
attribute takes an integer value
3—and if the site tagged with this label
is directly relevant to the user's query, Programmable Search Engine displays the site as one
of the top
N search results. For example, if you want the site
code.google.com to be the one of the top three results, you can create the
following label for it:
<Label name="best_resource" mode="FILTER" top="3"/>
If code.google.com is relevant to your users' search queries, it will
automatically be one of the top three results. However, if it has nothing to do
with the search queries, it will not appear in the top three results. But since
a filter label that has an undefined weight implicitly carries a weight of
0.7, the label still gives the code.google.com site some
top attribute works best with filter search engines. Often,
with the right queries, you will find sites tagged with such labels appear at
the top of the results page. It gets trickier with boost search engines, because
sites that are both far more relevant to the search query and tagged with a
boost weight of
+1.0 could outrank a site tagged with a top label.
In that case, consider adding a weight of
+1.0 to the top label
to give it an extra lift. However, if the favored site is competing against far
more relevant results, even that tactic might not place the site at the top.
If you decide that certain webpages are especially relevant and you want to bypass Google's ranking algorithm altogether, you can create promotions, which appear at the top of the results page. You have to define the query terms and the associated results.
Tagging Sites with Labels
Once you have defined labels, you can start tagging sites with them. Each annotation can have multiple labels, which means that the same site can be used in other search engines and be ranked differently. This section concentrates on Programmable Search XML annotations; TSV annotations are discussed at the end of this page.
<Annotations> <Annotation about="webcast.berkeley.edu/*" score="1"> <Label name="cse_university_boost_highest"/> <Label name="cse_bicycles_exclude"/> <Label name="cse_hamsters_filter"/> </Annotation> </Annotations>
Modulating the Effects of Labels
Scores let you modulate the influence of labels. They can dampen or reverse
the effects of the labels on specific sites. The
score attribute of
Annotation element can have a value that ranges from
1.0. A score of
0 removes the
influence of the label over the ranking of the site; a score of
applies the full influence; a score of
-1 completely reverses the
effects. Values between
0 (for example,
0.55) are for fine-tuning the
influence of the labels. If you do not assign a score to an annotation, Custom
Search applies the full effect of the label to the site. It is as though you
have assigned it a score of
The following table demonstrates how scores can adjust the influence of labels:
|Any||Any||None||The same as giving the annotation a score of
||The same as reversing the
||The same as reversing the
||The same as tagging the site with an
||The same as reversing the
||The same as converting the
Example: Code for Score
In the following example, we have three sites tagged with the same search engine label. However, the effects of the label are not uniform across the three different sites because each annotation has a different score, applying the label with different intensities.
<Annotations> <Annotation about="*.edu/*" score="0.0001"> <Label name="vision_label"/> </Annotation> <Annotation about="*.ucsd.edu/*" score="0.7"> <Label name="vision_label"/> </Annotation> <Annotation about="*.vision.ucsd.edu/*" score="1"> <Label name="vision_label"/> </Annotation> </Annotations>
Even though all three annotations have the
Programmable Search Engine treats them differently on account of their scores. Results from
vision.ucsd.edu are heavily favored; those from
are moderately favored; and those from
domains are slightly favored over other sites.
Tagging TSV Annotations with Labels
If you are using the TSV format instead of XML, you can also still tweak
the ranking of the search results. You can tag sites with labels and apply
scores to them. As explained in the previous sections, scores are a way to
modulate the influence of labels. A score value can range from -
+1.0. A positive value strengthens the effects of a label;
0 ignores the effects; and a negative value reverses the effects of
the label. As you approach zero, the effect of the label weakens, and as you
-1.0, the effect of the label is completely reversed.
The Annotations: Selecting Sites page
discusses listing sites and labels using the TSV format. To change the ranking,
simply add a
Score heading and define its values.
The following is a tab-delimited annotations file that includes sites for some disease-related webpages.
URL Label Label Label Score Comment A=Date www.cancer.gov/cancertopics/types/liver/* _cse_Ansi-stoubiq symptoms This labels this url as symptoms. 20060504 www.medicinenet.com/liver_cancer/* _cse_Ansi-stoubiq symptoms 1.0 This labels this url as symptoms. 20060504 www.webmd.com/hw/cancer/* _cse_Ansi-stoubiq symptoms for_patients 1.0 This is a great site for patients! 20060504 www.oncologychannel.com/*/treatment _cse_Ansi-stoubiq treatment 20060504 www.sirweb.org/*Treatments _cse_Ansi-stoubiq treatment 0.7 20060504