Tuesday, February 13, 2007
If you're planning to be at Search Engine Strategies London February 13-15, stop by and say hi to one of the many Googlers who will be there. I'll be speaking on Wednesday at the Successful Site Architecture panel and thought I'd offer up some tips for building crawlable sites for those who can't attend.
Make sure visitors and search engines can access the content
- Check the Crawl errors section of Webmaster Tools for any pages Googlebot couldn't access due to server or other errors. If Googlebot can't access the pages, they won't be indexed and visitors likely can't access them either.
- Make sure your robots.txt file doesn't accidentally block search engines from content you want indexed. You can see a list of the files Googlebot was blocked from crawling in Webmaster Tools. You can also use our robots.txt analysis tool to make sure you're blocking and allowing the files you intend.
- Check the Googlebot activity reports to see how long it takes to download a page of your site to make sure you don't have any network slowness issues.
- If pages of your site require a login and you want the content from those pages indexed, ensure you include a substantial amount of indexable content on pages that aren't behind the login. For instance, you can put several content-rich paragraphs of an article outside the login area, with a login link that leads to the rest of the article.
- How accessible is your site? How does it look in mobile browsers and screen readers? It's well worth testing your site under these conditions and ensuring that visitors can access the content of the site using any of these mechanisms.
Make sure your content is viewable
- Check out your site in a text-only browser or view it in a browser with images and Javascript turned off. Can you still see all of the text and navigation?
- Ensure the important text and navigation in your site is in HTML, not in images, and make sure all images have alt text that describe them.
- If you use Flash, use it only when needed. Particularly, don't put all of the text from your site in Flash. An ideal Flash-based site has pages with HTML text and Flash accents. If you use Flash for your home page, make sure that the navigation into the site is in HTML.
Be descriptive
- Make sure each page has a unique title tag and meta description tag that aptly describe the page.
- Make sure the important elements of your pages (for instance, your company name and the main topic of the page) are in HTML text.
- Make sure the words that searchers will use to look for you are on the page.
Keep the site crawlable
- If possible, avoid frames. Frame-based sites don't allow for unique URLs for each page, which makes indexing each page separately problematic.
- 
    Ensure the server returns a 404status code for pages that aren't found. Some servers are configured to return a200status code, particularly with custom error messages and this can result in search engines spending time crawling and indexing non-existent pages rather than the valid pages of the site.
- 
    Avoid infinite crawls. For instance, if your site has an infinite calendar, add a
    nofollowattribute to links to dynamically-created future calendar pages. Each search engine may interpret thenofollowattribute differently, so check with the help documentation for each. Alternatively, you could use thenofollowmetatag to ensure that search engine spiders don't crawl any outgoing links on a page, or use robots.txt to prevent search engines from crawling URLs that can lead to infinite loops.
- If your site uses session IDs or cookies, ensure those are not required for crawling.
- If your site is dynamic, avoid using excessive parameters and use friendly URLs when you can. Some content management systems enable you to rewrite URLs to friendly versions.
See our tips for creating a Google-friendly site and webmaster guidelines for more information on designing your site for maximum crawlability and usability.
If you will be at SES London, I'd love for you to come by and hear more. And check out the other Googlers' sessions too:
Tuesday, February 13th
Auditing Paid Listings and Clickfraud Issues; 10:45 - 12:00; Shuman Ghosemajumder, Business Product Manager for Trust and Safety
Wednesday, February 14th
  A Keynote Conversation;
  9:00 - 9:45; Matt Cutts, Software Engineer
  Successful Site Architecture;
  10:30 - 11:45; Vanessa Fox, Product Manager, Webmaster Central
  Google University;
  12:45 - 1:45
  Converting Visitors into Buyers;
  2:45 - 4:00; Brian Clifton, Head of Web Analytics, Google Europe
  Search Advertising Forum;
  4:30 - 5:45; David Thacker, Senior Product Manager
Thursday, February 15th
  Meet the Crawlers;
  9:00 - 10:15; Dan Crow, Product Manager
  Web Analytics and Measuring Successful Overview;
  1:15 - 2:30; Brian Clifton, Head of Web Analytics, Google Europe
  Search Advertising Clinic;
  1:15 - 2:30; Will Ashton, Retail Account Strategist
  Site Clinic;
  3:00 - 4:15; Sandeepan Banerjee, Sr. Product Manager, Indexing