Wednesday, February 12, 2014
Faceted navigation, such as filtering by color or price range, can be helpful for your visitors, but it's often not search-friendly since it creates many combinations of URLs with duplicative content. With duplicative URLs, search engines may not crawl new or updated unique content as quickly, and/or they may not index a page accurately because indexing signals are diluted between the duplicate versions. To reduce these issues and help faceted navigation sites become as search-friendly as possible, we'd like to:
- Provide background and potential issues with faceted navigation
- Highlight worst practices
Share best practices
In an ideal state, unique content—whether an individual product/article or a category of products/articles— would have only one accessible URL. This URL would have a clear click path, or route to the content from within the site, accessible by clicking from the home page or a category page.
Ideal for searchers and Google Search
Clear path that reaches all individual product/article pages
One representative URL for category page
One representative URL for individual product page
Undesirable duplication caused with faceted navigation
Numerous URLs for the same article/product
The same product page for swedish fish can be available on multiple URLs.
Numerous category pages that provide little or no value to searchers and search engines), as demontrated in the following table:
- No added value to Google searchers given users rarely search for "sour gummy candy price five to ten dollars".
- No added value for search engine crawlers that discover same item ("fruit salad") from parent category pages (either "gummy candies" or "sour gummy candies").
- Negative value to site owner who may have indexing signals diluted between numerous versions of the same category.
- Negative value to site owner with respect to serving bandwidth and losing crawler capacity to duplicative content rather than new or updated pages.
- No value for search engines (should have 404 response code).
- Negative value to searchers.
Worst (search un-friendly) practices for faceted navigation
Worst practice #1: Non-standard URL encoding for parameters, like commas or brackets, instead of
- Key-value pairs marked with
- Multiple parameters appended with
[ ]rather than
- Key-value pairs marked with a
- Multiple parameters appended with
While humans may be able to decode odd URL parameters, such as
,,, crawlers have
difficulty interpreting URL parameters when they're implemented in a non-standard fashion.
Software engineer on Google's Crawling Team, Mehmet Aktuna, says "Using non-standard encoding is
just asking for trouble." Instead, connect key-value pairs with an equal sign (
append multiple parameters with an ampersand (
Worst practice #2: Using directories or file paths rather than parameters to list values that don't change page content.
/c123/ is a category,
/s789/ is a
session ID that doesn't
change page content:
/gummy-candy/, changes the page content in a meaningful way:
URL parameters allow more flexibility for search engines to determine how to crawl efficiently.
It's difficult for automated programs, like search engine crawlers, to differentiate useful values
gummy-candy) from the useless ones (for example,
sessionID) when values are placed directly in the path. On the other hand, URL
parameters provide flexibility for search engines to quickly test and determine when a given value
doesn't require the crawler access all variations.
Common values that don't change page content and should be listed as URL parameters include:
- Session IDs
- Tracking IDs
- Referrer IDs
Worst practice #3: Converting user-generated values into (possibly infinite) URL parameters that are crawlable and indexable, but not useful in search results.
For example, user-generated values like longitude/latitude or "days ago" as crawlable and indexable URLs:
Rather than allow user-generated values to create crawlable URLs—which leads to infinite possibilities with very little value to searchers—perhaps publish category pages for the most popular values, then include additional information so the page provides more value than an ordinary search results page. Alternatively, consider placing user-generated values in a separate directory and then robots.txt disallow crawling of that directory.
User-agent: * Disallow: /filtering/
Worst practice #4: Appending URL parameters without logic.
Extraneous URL parameters only increase duplication, causing less efficient crawling and indexing. Therefore, consider stripping unnecessary URL parameters and performing your site's "internal maintenance" before generating the URL. If many parameters are required for the user session, perhaps hide the information in a cookie rather than continually append values like:
Worst practice #5: Offering further refinement (filtering) when there are zero results.
Allowing users to select filters when zero items exist for the refinement.
Only create links/URLs when it's a valid user-selection (items exist). With zero items, grey out filtering options. To further improve usability, consider adding item counts next to each filter.
Prevent useless URLs and minimize the crawl space by only creating URLs when products exist. This
helps users to stay engaged on your site (fewer clicks on the back button when no products exist),
and helps minimize potential URLs known to crawlers. Furthermore, if a page isn't just temporarily
out-of-stock, but is unlikely to ever contain useful content, consider returning a
404 status code.
404 response, you can include a helpful message to users with more
navigation options or a search box to find related products.
Best practices for new faceted navigation implementations or redesigns
New sites that are considering implementing faceted navigation have several options to optimize the "crawl space" (the totality of URLs on your site known to Googlebot) for unique content pages, reduce crawling of duplicative pages, and consolidate indexing signals.
Determine which URL parameters are required for search engines to crawl every individual content
page (for example, determine what parameters are required to create at least one click-path to
each item). Required parameters may include
page, and others.
Determine which parameters would be valuable to searchers and their queries, and which would likely only cause duplication with unnecessary crawling or indexing. In the candy store example, I may find the URL parameter
tasteto be valuable to searchers for queries like "sour gummy candies" which could show the result
example.com/category.php?category=gummy-candies&taste=sour. However, I may consider the parameter
priceto only cause duplication, such as
category=gummy-candies&taste=sour&price=over-10. Other common examples:
Valuable parameters to searchers:
brand, and others.
price-range, and so on.
- Valuable parameters to searchers:
Consider implementing one of several configuration options for URLs that contain unnecessary parameters. Just make sure that the unnecessary URL parameters are never required in a crawler or user's click path to reach each individual product!
Make all unnecessary URLs links
rel="nofollow". This option minimizes the crawler's discovery of unnecessary URLs and therefore reduces the potentially explosive crawl space (URLs known to the crawler) that can occur with faceted navigation.
rel="nofollow"doesn't prevent the unnecessary URLs from being crawled (only a robots.txt
disallowprevents crawling). By allowing them to be crawled, however, you can consolidate indexing signals from the unnecessary URLs with a searcher-valuable URL by adding
rel="canonical"from the unnecessary URL to a superset URL (for example
example.com/category.php?category=gummy-candies&taste=sour&price=5-10can specify a
rel="canonical"to the superset "sour gummy candies" view-all page at
Option 2: Robots.txt
For URLs with unnecessary parameters, include a
/filtering/directory that will be robots.txt disallowed. This lets all search engines crawl good content, but will prevent crawling of the unwanted URLs. For instance, if my valuable parameters were item, category, and taste, and my unnecessary parameters were session-id and price. I may have the URL:
which could link to another URL valuable parameter such as taste:
But for the unnecessary parameters, such as price, the URL includes a predefined directory,
which is then robots.txt disallowed:
User-agent: * Disallow: /filtering/
Option 3: Separate hosts
If you're not using a CDN (sites using CDNs don't have this flexibility easily available in Webmaster Tools), consider placing any URLs with unnecessary parameters on a separate host—for example, creating main host
www.example.comand secondary host,
www2.example.com. On the secondary host (
www2), set the Crawl rate in Webmaster Tools to "low" while keeping the main host's crawl rate as high as possible. This would allow for more full crawling of the main host URLs and reduces Googlebot's focus on your unnecessary URLs.
- Be sure there remains at least one click path to all items on the main host.
If you'd like to consolidate indexing signals, consider adding
rel="canonical"from the secondary host to a superset URL on the main host (for example
www2.example.com/category.php?category=gummy-candies&taste=sour&price=5-10may specify a
rel="canonical"to the superset "sour gummy candies" view-all page,
- Prevent clickable links when no products exist for the category or filter.
Add logic to the display of URL parameters.
Remove unnecessary parameters rather than continuously append values. Avoid:
Help the searcher experience by keeping a consistent parameter order based on searcher-valuable parameters listed first (as the URL may be visible in search results) and searcher-irrelevant parameters last (for example, session ID). Avoid:
Improve indexing of individual content pages with
rel="canonical"to the preferred version of a page.
rel="canonical"can be used across hostnames or domains.
Improve indexing of paginated content (such as
page=2of the category "gummy candies") by either:
rel="canonical"from individual component pages in the series to the category's "view-all" page (for example,
page=3of "gummy candies" with
category=gummy-candies&page=all) while making sure that it's still a good searcher experience (for example, the page loads quickly).
pagination markup with
rel="prev"to consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole.
- Include only canonical URLs in sitemaps.
Best practices for existing sites with faceted navigation
First, know that the best practices listed above (for example,
unnecessary URLs) still apply if/when you're able to implement a larger redesign. Otherwise, with
existing faceted navigation, it's likely that a large crawl space was already discovered by search
engines. Therefore, focus on reducing further growth of unnecessary pages crawled by Googlebot and
consolidating indexing signals.
- Use parameters (when possible) with standard encoding and key-value pairs.
- Verify that values that don't change page content, such as session IDs, are implemented as standard key-value pairs, not directories.
- Prevent clickable anchors when products exist for the category/filter. Don't allow clicks or URLs to be created when no items exist for the filter.
Add logic to the display of URL parameters. Remove unnecessary parameters rather than continuously append values. For example, avoid:
Help the searcher experience by keeping a consistent parameter order based on searcher-valuable parameters listed first (as the URL may be visible in search results) and searcher-irrelevant parameters last. For example, avoid: