Topical CSE

Topical search engine is an engine that focuses on a particular topic. It covers a part of the whole Web rather than a particular website - this is possible because Google Custom Search allows you to include multiple websites in the same engine. This article discusses some of the interesting techniques for building and maintaining such engines.

Why build topical search engines?

It is possible to perform a very precise and robust advanced search using the standard google.com search engine, by choosing the right keywords, utilizing search operators and filtering results. However, some users might not be familiar with these advanced techniques. Since you know the context your user is operating in and the topic of the search, you can guide the user through the search process and make the discovery of valuable resources in the chosen domain much easier.

There are few techniques useful in building high quality Topical CSE's:

  • Curated sites to search index
  • Rewriting queries
  • Exposing additional data in the search results

Curated index

Sometimes search terms can be ambiguous or have a different meaning depending on the context. By including only high quality, relevant sites in your engine, you narrow down the search domain and therefore make the results more precise and meaningful.

Use URL Patterns

Remember to use URL patterns to search only a part of a site if required. E.g. if you want to search only tutorials about browser speed from the site html5rocks.com, you can use html5rocks.com/en/tutorials/speed/* url pattern in sites to search.

If you are not familiar with the url structure of the site you consider adding to your engine, you can perform a site: search in google.com to check it out. For example, to see sample of urls from html5rocks.com, type site:www.html5rocks.com into google.com search box.

Rewriting queries

If you know your audience well, you can anticipate their queries and apply power search features on user's behalf. You can rewrite their original query to include additional search terms and use advanced search operators or apply synonyms.

Adding search terms and operators

The most typical use of additional search terms is adding a keyword that describes the domain of the search, e.g. a word solar for for solar power search engine. Depending on the character of your engine, you might want to add additional search terms to every query, or only to some of them.

You can define an additional search term to be appended to every query via the Control Panel, in Search features > Advanced > Websearch Settings > Query Addition field.

It is also possible to add different search terms in each refinement tab. In the Search features > Refinements tab, add a new refinement and put the additional search term in the Optional word(s) field. When a user searches for some keyword in the engine and selects the newly created tab, their query is rewritten to include the additional terms from that refinement.

Sometimes it can be useful to add different terms dynamically depending on your user’s context.

You can specify such dynamic extra terms using webSearchQueryAddition attribute if you are using the Custom Search Element or orTerms parameter if you're using JSON api.

Example: In a local events search engine, if your application has access to a user's location, you might want to add the name of the city they are in to the search query.

Creating synonyms

You can expand your user's search queries by using synonyms, which are variants of a search term. If you create a synonym to a term that is likely to be used in your engine, your users will not need to type multiple variants - the alternative search terms will be added to their queries automatically.

You can create synonyms in the Control Panel in Search features > Synonyms. You can also read more about best practices for creating synonyms.

Custom rendering of search results

CSE is highly customizable and you can change the look and feel of the results using the options in the Control Panel. On top of that, if you're willing to write custom markup for your search results, the customization process can be much more sophisticated.

For example, depending on your user's needs, you might want to expose additional data in the search results beyond the standard title and text snippet.

Exposing additional data using structured data and custom snippet rendering

Google Custom Search is able to provide more information about a result than included in the text snippet. If the site or url that result is pointing to is publishing semantic markup, for example using schema.org vocabulary, this data can be available in search result as pagemap attributes.

To check which pagemap attibutes are known by CSE for a given URL, paste the url into the Structured Data Testing Tool. The pagemap values are listed in the Google Custom Search tab.

If you are using the CSE/GSS JSON API, you can access these values under items.pagemap value of the JSON response object.

If you are using Custom Search Element, you need to provide an HTML snippet to override the default rendering of the search results. The pagemap values are accessible through the Vars.richSnippet hook in the override snippet. To learn more about custom search results rendering with Custom Search Element, refer to it's documentation.

Overlaying the results with third-party data

An interesting technique is retrieving results programatically via CSE/GSS JSON API and then joining them with a third-party data source to provide added value for the end user.

Example: In Kritikos, a search engine developed by the Engineering Department at Liverpool University, the search results from CSE are overlaid with additional data coming from the Learning Registry.

Summary

Topical CSEs are a very valuable way of spreading the knowledge in a particular area and offer a tremendous value for users interested in the same topic. Through creating and grooming a well-curated index of sites, helping the user form the right query for a given use case and customizing the results, a topical engine can make finding the right information at the right time both pleasant and efficient.

Enviar comentarios sobre…