Stay organized with collections
Save and categorize content based on your preferences.
Tuesday, December 06, 2011
Some webmasters on our forums ask about hosting-related issues affecting their sites. To help
both hosting providers and webmasters recognize, diagnose, and fix such problems, we'd like to
share with you some of the common problems we've seen and suggest how you can fix them.
Blocking of Googlebot crawling. This is a very common issue usually due to a
misconfiguration in a firewall or DoS protection system and sometimes due to the content
management system the site runs. Protection systems are an important part of good hosting and
are often configured to block unusually high levels of server requests, sometimes
automatically.
Because, however, Googlebot often performs more requests than a human user, these protection
systems may decide to block Googlebot and prevent it from crawling your website. To check for
this kind of problem, use the
Fetch as Googlebot function
in Webmaster Tools, and check for other
crawl errors
shown in Webmaster Tools.
We offer several tools to webmasters and hosting providers who want more control over
Googlebot's crawling, and to improve crawling efficiency:
If you would like to change how hard Googlebot crawls your site, you can verify your website
in Webmaster Tools and
change Googlebot's crawl rate.
Hosting providers can verify ownership of their IP addresses too.
Availability issues. A related type of problem we see is websites being
unavailable when Googlebot (and users) attempt to access the site. This includes DNS issues,
overloaded servers leading to timeouts and refused connections, misconfigured content
distribution networks (CDNs), and many other kinds of errors. When Googlebot encounters such
issues, we report them in Webmaster Tools as either
URL unreachable errors
or
crawl errors.
Invalid SSL certificates. For SSL certificates to be valid for your website,
they need to match the name of the site. Common problems include expired SSL certificates and
servers misconfigured such that all websites on that server use the same certificate. Most web
browsers will try warn users in these situations, and Google tries to alert webmasters of this
issue by sending a message via Webmaster Tools. The fix for these problems is to make sure to
use SSL certificates that are valid for all your website's domains and subdomains your users
will interact with.
Wildcard DNS. Websites can be configured to respond to all subdomain
requests. For example, the website at example.com can be configured to respond to requests to
foo.example.com, made-up-name.example.com and all other subdomains.
In some cases this is desirable to have; for example, a user-generated content website may
choose to give each account its own subdomain. However, in some cases, the webmaster may not
wish to have this behavior as it may cause content to be duplicated unnecessarily across
different hostnames and it may also affect Googlebot's crawling.
To minimize problems in wildcard DNS setups, either configure your website to not use them, or
configure your server to not respond successfully to non-existent hostnames, either by
refusing the connection or by returning an HTTP 404 header.
Misconfigured virtual hosting. The symptom of this problem is that multiple
hosts and/or domain names hosted on the same server always return the contents of only one site.
To rephrase, although the server hosts multiple sites, it returns only one site regardless of
what is being requested. To diagnose the issue, you need to check that the server responds
correctly to the Host HTTP header.
Content duplication through hosting-specific URLs. Many hosts helpfully offer
URLs for your website for testing/development purposes. For example, if you're hosting the
website https://a.com/ on the hosting provider example.com, the host may offer access to your
site through a URL like https://a.example.com/ or https://example.com/~a/. Our recommendation is
to have these hosting-specific URLs not publicly accessible (by password protecting them); and
even if these URLs are accessible, our algorithms usually pick the URL webmasters intend. If
our algorithms
select the hosting-specific URLs,
you can influence our algorithms to pick your preferred URLs by implementing
canonicalization techniques
correctly.
Soft error pages. Some hosting providers show error pages using an HTTP
200 status code (meaning "Success") instead of an HTTP error status code. For
example, a "Page not found" error page could return HTTP 200 instead of
404, making it a
soft 404 page;
or a "Website temporarily unavailable" message might return a 200 instead of
correctly returning a 503 HTTP status code. We try hard to detect soft error
pages, but when our algorithms fail to detect a web host's soft error pages, these pages may
get indexed with the error content. This may cause ranking or
cross-domain URL selection
issues.
It's easy to check the status code returned: simply check the HTTP headers the server returns
using any one of a number of tools, such as
Fetch as Googlebot.
If an error page is returning HTTP 200, change the configuration to return the
correct HTTP error status code. Also, keep an eye out for soft 404 reports in Webmaster Tools,
on the Crawl errors page in the Diagnostics section.
Content modification and frames. Webmasters may be surprised to see their
page contents modified by hosting providers, typically by injecting scripts or images into the
page. Web hosts may also serve your content by embedding it in other pages using
frame or iframe HTML elements. To check whether a web host is
changing your content in unexpected ways, simply check the source code of the page as served
by the host and compare it to the code you uploaded.
Note that some server-side code modifications may be very useful. For example, a server using
Google's
mod_pagespeed Apache module
or other tools may be returning your code minified for page speed optimization.
Spam and malware. We've seen some web hosts and bulk subdomain services become
major sources of malware and spam. We try hard to be granular in our actions when protecting our
users and search quality, but if we see a very large fraction of sites on a specific web host
that are spammy or are distributing malware, we may be forced to take action on the web host as
a whole. To help you keep on top of malware, we offer:
We hope this list helps both hosting providers and webmasters diagnose and fix these issues.
Beyond this list, also think about the qualitative aspects of hosting like quality of service and
the helpfulness of support. As always, if you have questions or need more help, please ask in our
Webmaster Help Forum.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eCommon hosting-related issues can negatively impact a website's visibility on Google Search, such as blocking Googlebot due to misconfigured firewalls or server settings.\u003c/p\u003e\n"],["\u003cp\u003eIssues like server unavailability, invalid SSL certificates, and wildcard DNS configurations can also hinder Google's ability to crawl and index a website.\u003c/p\u003e\n"],["\u003cp\u003eWebmasters should ensure hosting-specific URLs are not publicly accessible and use canonicalization to specify preferred URLs for Google to index.\u003c/p\u003e\n"],["\u003cp\u003eHosting providers should avoid using HTTP 200 status codes for error pages (soft error pages) and ensure content is not unexpectedly modified through injections or frames.\u003c/p\u003e\n"],["\u003cp\u003eBoth webmasters and hosting providers need to be vigilant about spam and malware, utilizing Google's resources to protect users and maintain search quality.\u003c/p\u003e\n"]]],["Web hosts and webmasters should address common site issues: blocking Googlebot via firewalls, website unavailability due to server errors, invalid SSL certificates, wildcard DNS misconfigurations, and virtual hosting errors. They should also avoid content duplication via hosting-specific URLs, soft error pages, and content modification. Hosts should ensure they are not sources of spam and malware, using tools such as Safe Browsing alerts.\n"],null,["# Tips for hosting providers and webmasters\n\nTuesday, December 06, 2011\n| It's been a while since we published this blog post. Some of the information may be outdated (for example, some images may be missing, and some links may not work anymore).\n\n\nSome webmasters on our forums ask about hosting-related issues affecting their sites. To help\nboth hosting providers and webmasters recognize, diagnose, and fix such problems, we'd like to\nshare with you some of the common problems we've seen and suggest how you can fix them.\n\n-\n **Blocking of Googlebot crawling** . This is a very common issue usually due to a\n misconfiguration in a firewall or DoS protection system and sometimes due to the content\n management system the site runs. Protection systems are an important part of good hosting and\n are often configured to block unusually high levels of server requests, sometimes\n automatically.\n Because, however, Googlebot often performs more requests than a human user, these protection\n systems may decide to block Googlebot and prevent it from crawling your website. To check for\n this kind of problem, use the\n [Fetch as Googlebot function](https://www.google.com/support/webmasters/bin/answer.py?answer=158587)\n in Webmaster Tools, and check for other\n [crawl errors](https://support.google.com/webmasters/answer/9679690)\n shown in Webmaster Tools.\n\n\n We offer several tools to webmasters and hosting providers who want more control over\n Googlebot's crawling, and to improve crawling efficiency:\n - We have detailed help about how you control Googlebot's crawling using the [robots exclusion protocol](/search/docs/crawling-indexing/robots/intro) and [configuring URL parameters](https://www.google.com/support/webmasters/bin/answer.py?answer=1235687).\n - If you're worried about rogue bots using the Googlebot user-agent, we offer a way to [verify whether a crawler is actually Googlebot](/search/docs/crawling-indexing/verifying-googlebot).\n - If you would like to change how hard Googlebot crawls your site, you can verify your website in Webmaster Tools and [change Googlebot's crawl rate](https://www.google.com/support/webmasters/bin/answer.py?answer=48620). Hosting providers can verify ownership of their IP addresses too.\n - We have more information in our [crawling and indexing FAQ](/search/help/crawling-index-faq).\n- **Availability issues** . A related type of problem we see is websites being unavailable when Googlebot (and users) attempt to access the site. This includes DNS issues, overloaded servers leading to timeouts and refused connections, misconfigured content distribution networks (CDNs), and many other kinds of errors. When Googlebot encounters such issues, we report them in Webmaster Tools as either [URL unreachable errors](https://www.google.com/support/webmasters/bin/answer.py?answer=35147) or [crawl errors](https://support.google.com/webmasters/answer/9679690).\n- **Invalid SSL certificates**. For SSL certificates to be valid for your website, they need to match the name of the site. Common problems include expired SSL certificates and servers misconfigured such that all websites on that server use the same certificate. Most web browsers will try warn users in these situations, and Google tries to alert webmasters of this issue by sending a message via Webmaster Tools. The fix for these problems is to make sure to use SSL certificates that are valid for all your website's domains and subdomains your users will interact with.\n-\n **Wildcard DNS**. Websites can be configured to respond to all subdomain\n requests. For example, the website at example.com can be configured to respond to requests to\n foo.example.com, made-up-name.example.com and all other subdomains.\n\n\n In some cases this is desirable to have; for example, a user-generated content website may\n choose to give each account its own subdomain. However, in some cases, the webmaster may not\n wish to have this behavior as it may cause content to be duplicated unnecessarily across\n different hostnames and it may also affect Googlebot's crawling.\n\n\n To minimize problems in wildcard DNS setups, either configure your website to not use them, or\n configure your server to not respond successfully to non-existent hostnames, either by\n refusing the connection or by returning an HTTP `404` header.\n- **Misconfigured virtual hosting** . The symptom of this problem is that multiple hosts and/or domain names hosted on the same server always return the contents of only one site. To rephrase, although the server hosts multiple sites, it returns only one site regardless of what is being requested. To diagnose the issue, you need to check that the server responds correctly to the `Host` HTTP header.\n- **Content duplication through hosting-specific URLs** . Many hosts helpfully offer URLs for your website for testing/development purposes. For example, if you're hosting the website https://a.com/ on the hosting provider example.com, the host may offer access to your site through a URL like https://a.example.com/ or https://example.com/\\~a/. Our recommendation is to have these hosting-specific URLs not publicly accessible (by password protecting them); and even if these URLs are accessible, our algorithms usually pick the URL webmasters intend. If our algorithms [select the hosting-specific URLs](https://www.google.com/support/webmasters/bin/answer.py?answer=1716747&topic=20985), you can influence our algorithms to pick your preferred URLs by implementing [canonicalization techniques](/search/docs/crawling-indexing/consolidate-duplicate-urls) correctly.\n-\n **Soft error pages** . Some hosting providers show error pages using an HTTP\n `200` status code (meaning \"Success\") instead of an HTTP error status code. For\n example, a \"Page not found\" error page could return HTTP `200` instead of\n `404`, making it a\n [`soft 404` page](/search/docs/crawling-indexing/http-network-errors#soft-404-errors);\n or a \"Website temporarily unavailable\" message might return a `200` instead of\n correctly returning a `503` HTTP status code. We try hard to detect soft error\n pages, but when our algorithms fail to detect a web host's soft error pages, these pages may\n get indexed with the error content. This may cause ranking or\n [cross-domain URL selection](https://www.google.com/support/webmasters/bin/answer.py?answer=1716747&topic=20985)\n issues.\n\n\n It's easy to check the status code returned: simply check the HTTP headers the server returns\n using any one of a number of tools, such as\n [Fetch as Googlebot](https://www.google.com/support/webmasters/bin/answer.py?answer=158587).\n If an error page is returning HTTP `200`, change the configuration to return the\n correct HTTP error status code. Also, keep an eye out for `soft 404` reports in Webmaster Tools,\n on the Crawl errors page in the Diagnostics section.\n-\n **Content modification and frames** . Webmasters may be surprised to see their\n page contents modified by hosting providers, typically by injecting scripts or images into the\n page. Web hosts may also serve your content by embedding it in other pages using\n `frame` or `iframe` HTML elements. To check whether a web host is\n changing your content in unexpected ways, simply check the source code of the page as served\n by the host and compare it to the code you uploaded.\n\n\n Note that some server-side code modifications may be very useful. For example, a server using\n Google's\n [`mod_pagespeed` Apache module](https://code.google.com/speed/page-speed/docs/module)\n or other tools may be returning your code minified for page speed optimization.\n- **Spam and malware**. We've seen some web hosts and bulk subdomain services become major sources of malware and spam. We try hard to be granular in our actions when protecting our users and search quality, but if we see a very large fraction of sites on a specific web host that are spammy or are distributing malware, we may be forced to take action on the web host as a whole. To help you keep on top of malware, we offer:\n - [Safe Browsing Alerts for Network Administrators](https://googleonlinesecurity.blogspot.com/2010/09/safe-browsing-alerts-for-network), useful for hosting providers\n - [Malware notifications](/search/docs/monitor-debug/security/malware) in Webmaster Tools for individual websites\n - A [Safe Browsing API](/safe-browsing) for developers\n\n\nWe hope this list helps both hosting providers and webmasters diagnose and fix these issues.\nBeyond this list, also think about the qualitative aspects of hosting like quality of service and\nthe helpfulness of support. As always, if you have questions or need more help, please ask in our\n[Webmaster Help Forum](https://support.google.com/webmasters/community).\n\n\nWritten by\n[Pierre Far](/search/blog/authors/pierre-far),\nWebmaster Trends Analyst"]]