What is canonicalization
Canonicalization is the process of selecting the representative –canonical– URL of a piece of content. Consequently, a canonical URL is the URL of a page that Google chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Google show only one version of the otherwise duplicate content in its search results.
There are many reasons why a site may have duplicate content:
- Region variants: for example, a piece of content for the USA and the UK, accessible from different URLs, but essentially the same content in the same language
- Device variants: for example, a page with both a mobile and a desktop version
- Protocol variants: for example, the HTTP and HTTPS versions of a site
- Site functions: for example, the results of sorting and filtering functions of a category page
- Accidental variants: for example, the demo version of the site is accidentally left accessible to crawlers
Some duplicate content on a site is normal and it's not a violation of Google's spam policies. However, having the same content accessible through many different URLs can be a bad user experience and it may make it harder for you to track how your content performs in search results.
How Google indexes and chooses the canonical URL
When Google indexes a page, it determines the primary content (or centerpiece) of each page. If Google finds multiple pages that seem to be the same or the primary content very similar, it chooses the page that, based on the factors (or signals) the indexing process collected, is objectively the most complete and useful for search users, and marks it as canonical. The canonical page will be crawled most regularly; duplicates are crawled less frequently in order to reduce the crawling load on sites.
There are a handful of factors that play a role in canonicalization: whether the page
is served via HTTP or HTTPS, redirects, presence of the URL in a sitemap, and
link annotations. You can
indicate your preference to Google
using these techniques, but Google may choose a different page as canonical than you do,
for various reasons. That is, indicating a canonical preference is a hint, not a rule.
Different language versions of a single page are considered duplicates only if the primary content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates). To learn more about setting up localized sites, see our documentation about managing multi-lingual and multi-regional sites.
Google uses the canonical page as the main source to evaluate content and quality. A Google Search result usually points to the canonical page, unless one of the duplicates is explicitly better suited for a search user. For example, the search result will probably point to the mobile page if the user is on a mobile device, even if the desktop page is marked as canonical.