Stay organized with collections
Save and categorize content based on your preferences.
Monday, October 31, 2011
A piece of content can often be reached via several URLs, not all of which may be on the same
domain. A common example we've talked about over the years is having the same content available
on more than one URL, an issue known as
duplicate content.
When we discover a group of pages with duplicate content, Google uses algorithms to select one
representative URL for that content. A group of pages may contain URLs from the same site or
from different sites. When the representative URL is selected from a group with different sites
the selection is called a cross-domain URL selection. To take a simple example, if the group of
URLs contains one URL from a.com and one URL from b.com and our algorithms select the URL from
b.com, the a.com URL may no longer be shown in our search results and may see a drop in
search-referred traffic.
Webmasters can greatly influence our algorithms' selections using one of the currently supported
mechanisms to indicate the preferred URL, for example using
rel="canonical" elements
or
301 redirects.
In most cases, the decisions our algorithms make in this regard correctly reflect the webmaster's
intent. However, in some rare cases we've also found many webmasters are confused as to why it
has happened and what they can do if they believe the selection is incorrect.
To be transparent about cross-domain URL selection decisions, we're launching new Webmaster Tools
messages that will attempt to notify webmasters when our algorithms select an external URL instead
of one from their website. The details about how these messages work are in our
Help Center article about the topic,
and in this blog post we'll discuss the different scenarios in which you may see a cross-domain
URL selection and what you can do to fix any selections you believe are incorrect.
Common causes of cross-domain URL selection
There are many scenarios that can lead our algorithms to select URLs across domains.
In most cases, our algorithms select a URL based on signals that the webmaster implemented to
influence the decision. For example, a webmaster following
our guidelines
and
best practices
for moving websites is effectively signalling that the URLs on their new website are the ones
they prefer for Google to select. If you're moving your website and see these new messages in
Webmaster Tools, you can take that as confirmation that our algorithms have noticed.
However, we regularly see webmasters ask questions when our algorithms select a URL they did not
want selected. When your website is involved in a cross-domain selection, and you believe the
selection is incorrect (that is, not your intention), there are several strategies to improve the
situation. Here are some of the common causes of unexpected cross-domain URL selections that we've
seen, and how to fix them:
Duplicate content, including multi-regional websites: We regularly see
webmasters use substantially the same content in the same language on multiple domains,
sometimes inadvertently and sometimes to geotarget the content. For example, it's common to see
a webmaster set up the same English language website on both example.com and example.net, or a
German language website hosted on a.de, a.at, and a.ch.Depending on your website and your users,
you can use one of the currently-supported canonicalization techniques to signal to our
algorithms which URLs you wish selected. Please see the following articles about this topic:
Configuration mistakes: Certain types of misconfigurations can lead our
algorithms to make an incorrect decision. Examples of misconfiguration scenarios include:
Incorrect canonicalization: Incorrect usage of
canonicalization techniques
pointing to URLs on an external website can lead our algorithms to select the external URLs to
show in our search results. We've seen this happen with misconfigured content management
systems (CMS) or CMS plugins installed by the webmaster. To fix this kind of situation, find
how your website is incorrectly indicating the canonical URL preference (for example, through
incorrect usage of a rel="canonical" element or a 301 redirect) and
fix that.
Misconfigured servers: Sometimes we see hosting misconfigurations where
content from site a.com is returned for URLs on b.com. A similar case occurs when two
unrelated web servers return identical
soft 404 pages
that we may fail to detect as error pages. In both situations we may assume the same content
is being returned from two different sites and our algorithms may incorrectly select the
a.com URL as the canonical of the b.com URL.You will need to investigate which part of your
website's serving infrastructure is misconfigured. For example, your server may be returning
HTTP 200 (success) status codes for error pages, or your server might be
confusing requests across different domains hosted on it. Once you find the root cause of
the issue, work with your server admins to correct the configuration.
Malicious website attacks: Some attacks on websites introduce code that can
cause undesired canonicalization. For example, the malicious code might cause the website to
return an
HTTP 301 redirect
or insert a cross-domain
rel="canonical"link element
into the HTML <head> or HTTP header, usually pointing to an external URL
hosting malicious content. In these cases our algorithms may select the malicious or spammy URL
instead of the URL on the compromised website.In this situation, please follow our
guidance on cleaning your site
and submit a reconsideration request when done. To identify
cloaked
attacks, you can use the
Fetch as Googlebot
function in Webmaster Tools to see your page's content as Googlebot sees it.
In rare situations, our algorithms may select a URL from an external site that is hosting your
content without your permission. If you believe that another site is duplicating your content in
violation of copyright law, you may contact the site's host to request removal. In addition, you
can request that Google remove the infringing page from our search results by
filing a request under the Digital Millennium Copyright Act.
And as always, if you need help in identifying the cause of an incorrect decision or how to fix
it, you can see our
Help Center article
about this topic and ask in our
Webmaster Help Forum.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eGoogle's algorithms select a representative URL from a group of duplicate content pages, sometimes resulting in cross-domain URL selection.\u003c/p\u003e\n"],["\u003cp\u003eWebmasters can influence URL selection using canonicalization techniques like \u003ccode\u003erel="canonical"\u003c/code\u003e and 301 redirects.\u003c/p\u003e\n"],["\u003cp\u003eUnexpected cross-domain selections can be caused by duplicate content, misconfigurations (e.g., incorrect canonicalization, server errors), or malicious attacks.\u003c/p\u003e\n"],["\u003cp\u003eSolutions include using proper canonicalization, fixing server configurations, addressing website security issues, and potentially filing a DMCA request for copyright infringement.\u003c/p\u003e\n"],["\u003cp\u003eGoogle provides Webmaster Tools messages to inform webmasters about cross-domain URL selections and resources for resolving issues.\u003c/p\u003e\n"]]],["Google's algorithms select one representative URL when duplicate content is found across multiple URLs, even across different domains. Webmasters can influence this selection using `rel=\"canonical\"` or `301` redirects. New Webmaster Tools messages alert webmasters of cross-domain selections. Common causes include duplicate content, configuration mistakes (incorrect canonicalization or server issues), and malicious attacks. Solutions involve fixing canonicalization, server settings, cleaning hacked sites, or filing copyright removal requests. Webmasters are advised to use tools and resources to address incorrect selections.\n"],null,["# Raising awareness of cross-domain URL selections\n\nMonday, October 31, 2011\n\n\nA piece of content can often be reached via several URLs, not all of which may be on the same\ndomain. A common example we've talked about over the years is having the same content available\non more than one URL, an issue known as\n[duplicate content](/search/docs/advanced/guidelines/duplicate-content).\nWhen we discover a group of pages with duplicate content, Google uses algorithms to select one\nrepresentative URL for that content. A group of pages may contain URLs from the same site or\nfrom different sites. When the representative URL is selected from a group with different sites\nthe selection is called a cross-domain URL selection. To take a simple example, if the group of\nURLs contains one URL from a.com and one URL from b.com and our algorithms select the URL from\nb.com, the a.com URL may no longer be shown in our search results and may see a drop in\nsearch-referred traffic.\n\n\nWebmasters can greatly influence our algorithms' selections using one of the currently supported\nmechanisms to indicate the preferred URL, for example using\n[`rel=\"canonical\"` elements](/search/docs/crawling-indexing/consolidate-duplicate-urls)\nor\n[`301` redirects](/search/docs/crawling-indexing/301-redirects).\nIn most cases, the decisions our algorithms make in this regard correctly reflect the webmaster's\nintent. However, in some rare cases we've also found many webmasters are confused as to why it\nhas happened and what they can do if they believe the selection is incorrect.\n\n\nTo be transparent about cross-domain URL selection decisions, we're launching new Webmaster Tools\nmessages that will attempt to notify webmasters when our algorithms select an external URL instead\nof one from their website. The details about how these messages work are in our\n[Help Center article about the topic](https://www.google.com/support/webmasters/bin/answer.py?answer=1716747&topic=20985),\nand in this blog post we'll discuss the different scenarios in which you may see a cross-domain\nURL selection and what you can do to fix any selections you believe are incorrect.\n\nCommon causes of cross-domain URL selection\n-------------------------------------------\n\nThere are many scenarios that can lead our algorithms to select URLs across domains.\n\n\nIn most cases, our algorithms select a URL based on signals that the webmaster implemented to\ninfluence the decision. For example, a webmaster following\n[our guidelines](/search/docs/crawling-indexing/site-move-no-url-changes)\nand\n[best practices](/search/blog/2008/04/best-practices-when-moving-your-site)\nfor moving websites is effectively signalling that the URLs on their new website are the ones\nthey prefer for Google to select. If you're moving your website and see these new messages in\nWebmaster Tools, you can take that as confirmation that our algorithms have noticed.\n\n\nHowever, we regularly see webmasters ask questions when our algorithms select a URL they did not\nwant selected. When your website is involved in a cross-domain selection, and you believe the\nselection is incorrect (that is, not your intention), there are several strategies to improve the\nsituation. Here are some of the common causes of unexpected cross-domain URL selections that we've\nseen, and how to fix them:\n\n1. **Duplicate content, including multi-regional websites**: We regularly see webmasters use substantially the same content in the same language on multiple domains, sometimes inadvertently and sometimes to geotarget the content. For example, it's common to see a webmaster set up the same English language website on both example.com and example.net, or a German language website hosted on a.de, a.at, and a.ch.Depending on your website and your users, you can use one of the currently-supported canonicalization techniques to signal to our algorithms which URLs you wish selected. Please see the following articles about this topic:\n - [Canonicalization](/search/docs/crawling-indexing/consolidate-duplicate-urls), specifically [`rel=\"canonical\"` elements](/search/docs/crawling-indexing/consolidate-duplicate-urls) and [`301` redirects](/search/docs/crawling-indexing/301-redirects)\n - [Multi-regional and multilingual sites](/search/docs/specialty/international/managing-multi-regional-sites) and more about [working with multi-regional websites](/search/blog/2010/03/working-with-multi-regional-websites)\n - [About `rel=\"alternate\" hreflang=\"x\"`](/search/docs/specialty/international/localized-versions)\n2. **Configuration mistakes**: Certain types of misconfigurations can lead our algorithms to make an incorrect decision. Examples of misconfiguration scenarios include:\n 1. **Incorrect canonicalization** : Incorrect usage of [canonicalization techniques](/search/docs/crawling-indexing/consolidate-duplicate-urls) pointing to URLs on an external website can lead our algorithms to select the external URLs to show in our search results. We've seen this happen with misconfigured content management systems (CMS) or CMS plugins installed by the webmaster. To fix this kind of situation, find how your website is incorrectly indicating the canonical URL preference (for example, through incorrect usage of a `rel=\"canonical\"` element or a `301` redirect) and fix that.\n 2. **Misconfigured servers** : Sometimes we see hosting misconfigurations where content from site a.com is returned for URLs on b.com. A similar case occurs when two unrelated web servers return identical [`soft 404` pages](/search/docs/crawling-indexing/http-network-errors#soft-404-errors) that we may fail to detect as error pages. In both situations we may assume the same content is being returned from two different sites and our algorithms may incorrectly select the a.com URL as the canonical of the b.com URL.You will need to investigate which part of your website's serving infrastructure is misconfigured. For example, your server may be returning HTTP `200 (success)` status codes for error pages, or your server might be confusing requests across different domains hosted on it. Once you find the root cause of the issue, work with your server admins to correct the configuration.\n3. **Malicious website attacks** : Some attacks on websites introduce code that can cause undesired canonicalization. For example, the malicious code might cause the website to return an [HTTP `301` redirect](/search/docs/crawling-indexing/301-redirects) or insert a cross-domain [`rel=\"canonical\"` `link` element](/search/docs/crawling-indexing/consolidate-duplicate-urls) into the HTML `\u003chead\u003e` or HTTP header, usually pointing to an external URL hosting malicious content. In these cases our algorithms may select the malicious or spammy URL instead of the URL on the compromised website.In this situation, please follow our [guidance on cleaning your site](https://web.dev/articles/hacked) and submit a reconsideration request when done. To identify [cloaked](/search/docs/essentials/spam-policies#cloaking) attacks, you can use the [Fetch as Googlebot](https://www.google.com/support/webmasters/bin/answer.py?answer=158587) function in Webmaster Tools to see your page's content as Googlebot sees it.\n\n\nIn rare situations, our algorithms may select a URL from an external site that is hosting your\ncontent without your permission. If you believe that another site is duplicating your content in\nviolation of copyright law, you may contact the site's host to request removal. In addition, you\ncan request that Google remove the infringing page from our search results by\n[filing a request under the Digital Millennium Copyright Act](https://www.google.com/support/bin/answer.py?answer=1386831).\n\n\nAnd as always, if you need help in identifying the cause of an incorrect decision or how to fix\nit, you can see our\n[Help Center article](https://www.google.com/support/webmasters/bin/answer.py?answer=1716747&topic=20985)\nabout this topic and ask in our\n[Webmaster Help Forum](https://support.google.com/webmasters/community).\n\n\nPosted by\n[Pierre Far](/search/blog/authors/pierre-far),\nWebmaster Trends Analyst"]]