Tuesday, February 12, 2008
Our search quality and Webmaster Central teams love helping webmasters solve problems. But since we can't be in all places at all times answering all questions, we also try hard to show you how to help yourself. We put a lot of work into providing documentation and blog posts to answer your questions and guide you through the data and tools we provide, and we're constantly looking for ways to improve the visibility of that information.
While I always encourage people to search our Help Center and blog for answers, there are a few articles in particular to which I'm constantly referring people. Some are recent and some are buried in years' worth of archives, but each is worth a read:
- Googlebot can't access my website Web hosters seem to be getting more aggressive about blocking spam bots and aggressive crawlers from their servers, which is generally a good thing; however, sometimes they also block Googlebot without knowing it. If you or your hoster are "allowing" Googlebot through by whitelisting Googlebot IP addresses, you may still be blocking some of our IPs without knowing it (since our full IP list isn't public, for reasons explained in the post). In order to be sure you're allowing Googlebot access to your site, use the method in this blog post to verify whether a crawler is Googlebot.
- URL blocked by robots.txt Sometimes the web crawl section of Webmaster Tools reports a URL as "blocked by robots.txt", but your robots.txt file doesn't seem to block crawling of that URL. Check out this list of troubleshooting tips, especially the part about redirects. This thread from our Help Group also explains why you may see discrepancies between our web crawl error reports and our robots.txt analysis tool.
- Why was my URL removal request denied? (Okay, I'm cheating a little: this one is a Help Center article and not a blog post.) In order to remove a URL from Google search results you need to first put something in place that will prevent Googlebot from simply picking that URL up again the next time it crawls your site. This may be a 404 (or 410) status code, a noindex meta tag, or a robots.txt file, depending on what type of removal request you're submitting. Follow the directions in this article and you should be good to go.
- Flash best practices Flash continues to be a hot topic for webmasters interested in making visually complex content accessible to search engines. In this post Bergy, our resident Flash expert, outlines best practices for working with Flash.
- The supplemental index The "supplemental index" was a big topic of conversation in 2007, and it seems some webmasters are still worried about it. Instead of worrying, point your browser to this post on how we now search our entire index for every query.
- Duplicate content Duplicate content—another perennial concern of webmasters. This post talks in detail about duplicate content caused by URL parameters, and also references Adam's previous post on deftly dealing with duplicate content, which gives lots of good suggestions on how to avoid or mitigate problems caused by duplicate content.
- Sitemaps FAQs This post answers the most frequent questions we get about Sitemaps. And I'm not just saying it's great because I posted it. :-)
Sometimes, knowing how to find existing information is the biggest barrier to getting a question answered. So try searching our blog, Help Center and Help Group next time you have a question, and please let us know if you can't find a piece of information that you think should be there!