Stay organized with collections
Save and categorize content based on your preferences.
Tuesday, October 06, 2009
Handling duplicate content within your own website can be a big challenge. Websites grow; features
get added, changed and removed; content comes—content goes. Over time, many websites collect
systematic cruft in the form of multiple URLs that return the same contents. Having duplicate
content on your website is generally not problematic, though it can make it harder for search
engines to crawl and index the content. Also, PageRank and similar information found via incoming
links can get diffused across pages we aren't currently recognizing as duplicates, potentially
making your preferred version of the page rank lower in Google.
Steps for dealing with duplicate content within your website
Recognize duplicate content on your website. The first and most important step is to
recognize duplicate content on your website. A simple way to do this is to take a unique text
snippet from a page and to search for it, limiting the results to pages from your own website
by using a
site: query
in Google. Multiple results for the same content show duplication you can investigate.
Determine your preferred URLs. Before fixing duplicate content issues, you'll have to
determine your preferred URL structure. Which URL would you prefer to use for that piece of
content?
Be consistent within your website. Once you've chosen your preferred URLs, make sure to
use them in all possible locations within your website (including in your
Sitemap file).
Apply 301 permanent redirects where necessary and possible. If you can,
redirect duplicate URLs to your preferred URLs using a 301 response code. This
helps users and search engines find your preferred URLs should they visit the duplicate URLs.
If your site is available on several domain names, pick one and use the 301
redirect appropriately from the others, making sure to forward to the right specific page, not
just the root of the domain. If you support both www and non-www host names, pick one, use the
preferred domain setting in Webmaster Tools,
and redirect appropriately.
Implement
the rel="canonical" link element
on your pages where you can.
Where 301 redirects are not possible, the rel="canonical" link
element can give us a better understanding of your site and of your preferred URLs. The use of
this link element is also supported by major search engines such as
Ask.com,
Bing
and
Yahoo!.
Use the
URL parameter handling tool
in Google Webmaster Tools where possible.
If some or all of your website's duplicate content comes from URLs with query parameters,
this tool can help you to notify us of important and irrelevant parameters within your URLs.
More information about this tool can be found in our
announcement blog post.
What about the robots.txt file?
One item which is missing from this list is disallowing crawling of duplicate content with your
robots.txt file. We now recommend not blocking access to duplicate content on your website,
whether with a robots.txt file or other methods. Instead, use the
rel="canonical" link element,
the
URL parameter handling tool,
or 301 redirects. If access to duplicate content is entirely blocked, search engines
effectively have to treat those URLs as separate, unique pages since they cannot know that they're
actually just different URLs for the same content. A better solution is to allow them to be
crawled, but clearly mark them as duplicate using one of our recommended methods. If you allow us
to crawl these URLs, Googlebot will learn rules to identify duplicates just by looking at the URL
and should largely avoid unnecessary recrawls in any case. In cases where duplicate content still
leads to us crawling too much of your website, you can also
adjust the crawl rate setting in Webmaster Tools.
We hope these methods will help you to manage the duplicate content on your website! Information
about duplicate content in general can also be found in our
Help Center. Should you have any
questions, you can join the discussion in our
Webmaster Help Forum.
Posted by
John Mueller, Webmaster Trends Analyst, Google Zürich
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eDuplicate content within a website can hinder search engine crawling and indexing, potentially affecting page ranking.\u003c/p\u003e\n"],["\u003cp\u003ePreferred URLs for content should be determined and used consistently across the site, including within the sitemap.\u003c/p\u003e\n"],["\u003cp\u003e301 redirects should be used to direct traffic from duplicate URLs to preferred URLs where feasible.\u003c/p\u003e\n"],["\u003cp\u003eWhen redirects aren't an option, using the \u003ccode\u003erel="canonical"\u003c/code\u003e link element can help search engines understand preferred URLs.\u003c/p\u003e\n"],["\u003cp\u003eGoogle recommends against blocking duplicate content in robots.txt; instead, utilize methods like canonicalization or redirects to consolidate it.\u003c/p\u003e\n"]]],["Duplicate content on websites can hinder search engine crawling and dilute page ranking. To address this, first, identify duplicate content using `site:` queries. Next, determine preferred URLs and use them consistently across your site, including sitemaps. Employ `301` redirects to direct duplicate URLs to preferred ones. Where redirects aren't feasible, implement `rel=\"canonical\"` links. Utilize the URL parameter handling tool in Google Webmaster Tools for parameter-based duplicates. Avoid blocking duplicate content via robots.txt.\n"],null,["# Reunifying duplicate content on your website\n\nTuesday, October 06, 2009\n\n\nHandling duplicate content within your own website can be a big challenge. Websites grow; features\nget added, changed and removed; content comes---content goes. Over time, many websites collect\nsystematic cruft in the form of multiple URLs that return the same contents. Having duplicate\ncontent on your website is generally not problematic, though it can make it harder for search\nengines to crawl and index the content. Also, PageRank and similar information found via incoming\nlinks can get diffused across pages we aren't currently recognizing as duplicates, potentially\nmaking your preferred version of the page rank lower in Google.\n\nSteps for dealing with duplicate content within your website\n------------------------------------------------------------\n\n1. **Recognize duplicate content on your website.** The first and most important step is to recognize duplicate content on your website. A simple way to do this is to take a unique text snippet from a page and to search for it, limiting the results to pages from your own website by using a [`site:` query](https://www.google.com/support/websearch/bin/answer.py?answer=136861) in Google. Multiple results for the same content show duplication you can investigate.\n2. **Determine your preferred URLs.** Before fixing duplicate content issues, you'll have to determine your preferred URL structure. Which URL would you prefer to use for that piece of content?\n3. **Be consistent within your website.** Once you've chosen your preferred URLs, make sure to use them in all possible locations within your website (including in your [Sitemap file](https://www.sitemaps.org/index.html)).\n4. **Apply `301` permanent redirects where necessary and possible.** If you can, redirect duplicate URLs to your preferred URLs using a `301` response code. This helps users and search engines find your preferred URLs should they visit the duplicate URLs. If your site is available on several domain names, pick one and use the `301` redirect appropriately from the others, making sure to forward to the right specific page, not just the root of the domain. If you support both www and non-www host names, pick one, use the [preferred domain setting in Webmaster Tools](https://www.google.com/support/webmasters/bin/answer.py?answer=44231), and redirect appropriately.\n5. **Implement\n [the `rel=\"canonical\"` link element](/search/docs/crawling-indexing/consolidate-duplicate-urls)\n on your pages where you can.** Where `301` redirects are not possible, the `rel=\"canonical\"` link element can give us a better understanding of your site and of your preferred URLs. The use of this link element is also supported by major search engines such as [Ask.com](https://blog.ask.com/2009/02/ask-is-going-canonical), [Bing](https://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx) and [Yahoo!](https://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/).\n6. **Use the\n [URL parameter handling tool](https://www.google.com/support/webmasters/bin/answer.py?answer=147959)\n in Google Webmaster Tools where possible.** If some or all of your website's duplicate content comes from URLs with query parameters, this tool can help you to notify us of important and irrelevant parameters within your URLs. More information about this tool can be found in our [announcement blog post](/search/blog/2009/10/new-parameter-handling-tool-helps-with).\n\nWhat about the robots.txt file?\n-------------------------------\n\n\nOne item which is missing from this list is disallowing crawling of duplicate content with your\nrobots.txt file. **We now recommend not blocking access to duplicate content on your website,\nwhether with a robots.txt file or other methods** . Instead, use the\n[`rel=\"canonical\"` link element](/search/docs/crawling-indexing/consolidate-duplicate-urls),\nthe\n[URL parameter handling tool](https://www.google.com/support/webmasters/bin/answer.py?answer=147959),\nor `301` redirects. If access to duplicate content is entirely blocked, search engines\neffectively have to treat those URLs as separate, unique pages since they cannot know that they're\nactually just different URLs for the same content. A better solution is to allow them to be\ncrawled, but clearly mark them as duplicate using one of our recommended methods. If you allow us\nto crawl these URLs, Googlebot will learn rules to identify duplicates just by looking at the URL\nand should largely avoid unnecessary recrawls in any case. In cases where duplicate content still\nleads to us crawling too much of your website, you can also\n[adjust the crawl rate setting in Webmaster Tools](https://www.google.com/support/webmasters/bin/answer.py?answer=48620).\n\n\nWe hope these methods will help you to manage the duplicate content on your website! Information\nabout duplicate content in general can also be found in our\n[Help Center](/search/docs/advanced/guidelines/duplicate-content). Should you have any\nquestions, you can join the discussion in our\n[Webmaster Help Forum](https://support.google.com/webmasters/community/thread?tid=7ffaad3ce68b0a59).\n\n\nPosted by\n[John Mueller](https://johnmu.com/), Webmaster Trends Analyst, Google Zürich"]]