Come see us at SES London and hear tips on successful site architecture
Stay organized with collections
Save and categorize content based on your preferences.
Tuesday, February 13, 2007
If you're planning to be at
Search Engine Strategies London
February 13-15, stop by and say hi to one of the many Googlers who will be there. I'll be speaking
on Wednesday at the
Successful Site Architecture
panel and thought I'd offer up some tips for building crawlable sites for those who can't attend.
Make sure visitors and search engines can access the content
Check the
Crawl errors section
of Webmaster Tools for any pages Googlebot couldn't access due to server or other errors.
If Googlebot can't access the pages, they won't be indexed and visitors likely can't access them
either.
Make sure your
robots.txt file
doesn't accidentally block search engines from content you want indexed. You can see a list of
the files Googlebot was
blocked from crawling
in Webmaster Tools. You can also use our
robots.txt analysis tool
to make sure you're blocking and allowing the files you intend.
Check the
Googlebot activity reports
to see how long it takes to download a page of your site to make sure you don't have any network
slowness issues.
If pages of your site require a login and you want the content from those pages indexed, ensure
you include a
substantial amount of indexable content
on pages that aren't behind the login. For instance, you can put several content-rich paragraphs
of an article outside the login area, with a login link that leads to the rest of the article.
How accessible is your site? How does it look in mobile browsers and screen readers? It's well
worth testing your site under these conditions and ensuring that visitors can access the content
of the site using any of these mechanisms.
Make sure your content is viewable
Check out your site in a text-only browser or view it in a browser with images and Javascript
turned off. Can you still see all of the text and navigation?
Ensure the important text and navigation in your site is in HTML, not
in images, and make sure all images
have alt text that describe them.
If you use Flash, use it only when needed. Particularly, don't put all of the text from your
site in Flash. An ideal Flash-based site has pages with HTML text and Flash accents. If you use
Flash for your home page, make sure that the navigation into the site is in HTML.
Make sure the important elements of your pages (for instance, your company name and the main
topic of the page) are in HTML text.
Make sure the words that searchers will use to look for you are on the page.
Keep the site crawlable
If possible,
avoid frames.
Frame-based sites don't allow for unique URLs for each page, which makes indexing each page
separately problematic.
Ensure the server returns a 404 status code for pages that aren't found.
Some servers are
configured to return a 200 status code,
particularly with custom error messages and this can result in search engines spending time
crawling and indexing non-existent pages rather than the valid pages of the site.
Avoid infinite crawls. For instance, if your site has an infinite calendar, add a
nofollow attribute
to links to dynamically-created future calendar pages. Each search engine may interpret the
nofollow attribute differently, so check with the help documentation for each.
Alternatively, you could use the
nofollowmeta tag
to ensure that search engine spiders don't crawl any outgoing links on a page, or use
robots.txt to prevent search engines from crawling URLs that can lead to infinite loops.
If your site uses session IDs or cookies, ensure those are not required for crawling.
If your site is dynamic, avoid using excessive parameters and use friendly URLs when you can.
Some content management systems enable you to rewrite URLs to friendly versions.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eEnsure your website is easily accessible to both visitors and search engines by checking for crawl errors, optimizing robots.txt, and improving site speed.\u003c/p\u003e\n"],["\u003cp\u003eMake your website content viewable in various browsing environments, including text-only browsers and with images/Javascript disabled, prioritizing HTML text over images for crucial information.\u003c/p\u003e\n"],["\u003cp\u003eOptimize your website for search engines by using descriptive title tags, meta descriptions, and incorporating relevant keywords within your content.\u003c/p\u003e\n"],["\u003cp\u003eMaintain a crawlable website structure by avoiding frames, using proper HTTP status codes, preventing infinite crawls, and optimizing URLs.\u003c/p\u003e\n"],["\u003cp\u003eFor enhanced accessibility, ensure your website is mobile-friendly and compatible with assistive technologies like screen readers.\u003c/p\u003e\n"]]],["The content focuses on ensuring website crawlability and usability for both search engines and visitors. Key actions include checking for crawl errors, ensuring robots.txt doesn't block desired content, and verifying site accessibility across different browsers and devices. The text advises using HTML for important text and navigation, using descriptive title and meta description tags, avoiding frames, managing 404 errors, and preventing infinite crawls. The author will speak at Search Engine Strategies London on site architecture, encouraging attendees to also attend sessions by other Googlers.\n"],null,["# Come see us at SES London and hear tips on successful site architecture\n\nTuesday, February 13, 2007\n\n\nIf you're planning to be at\n[Search Engine Strategies London](https://www.searchenginestrategies.com/sew/london07/index)\nFebruary 13-15, stop by and say hi to one of the many Googlers who will be there. I'll be speaking\non Wednesday at the\n[Successful Site Architecture](https://www.searchenginestrategies.com/sew/london07/agenda2.html#ssa)\npanel and thought I'd offer up some tips for building crawlable sites for those who can't attend.\n\nMake sure visitors and search engines can access the content\n------------------------------------------------------------\n\n- Check the [Crawl errors section](https://support.google.com/webmasters/answer/9679690) of Webmaster Tools for any pages Googlebot couldn't access due to server or other errors. If Googlebot can't access the pages, they won't be indexed and visitors likely can't access them either.\n- Make sure your [robots.txt file](/search/docs/crawling-indexing/robots/create-robots-txt) doesn't accidentally block search engines from content you want indexed. You can see a list of the files Googlebot was [blocked from crawling](https://support.google.com/webmasters/answer/9679690) in Webmaster Tools. You can also use our [robots.txt analysis tool](https://support.google.com/webmasters/answer/6062598) to make sure you're blocking and allowing the files you intend.\n- Check the [Googlebot activity reports](/search/blog/2006/10/googlebot-activity-reports) to see how long it takes to download a page of your site to make sure you don't have any network slowness issues.\n- If pages of your site require a login and you want the content from those pages indexed, ensure you include a [substantial amount of indexable content](https://www.mattcutts.com/blog/guest-post-vanessa-fox-on-organic-site-review-session/) on pages that aren't behind the login. For instance, you can put several content-rich paragraphs of an article outside the login area, with a login link that leads to the rest of the article.\n- How accessible is your site? How does it look in mobile browsers and screen readers? It's well worth testing your site under these conditions and ensuring that visitors can access the content of the site using any of these mechanisms.\n\nMake sure your content is viewable\n----------------------------------\n\n- Check out your site in a text-only browser or view it in a browser with images and Javascript turned off. Can you still see all of the text and navigation?\n- Ensure the important text and navigation in your site is in HTML, not [in images](/search/blog/2006/12/ses-chicago-using-images), and make sure all images have alt text that describe them.\n- If you use Flash, use it only when needed. Particularly, don't put all of the text from your site in Flash. An ideal Flash-based site has pages with HTML text and Flash accents. If you use Flash for your home page, make sure that the navigation into the site is in HTML.\n\nBe descriptive\n--------------\n\n- Make sure each page has a unique [title tag](https://www.w3schools.com/tags/tag_title.asp) and [meta description tag](https://www.w3schools.com/tags/tag_meta.asp) that aptly describe the page.\n- Make sure the important elements of your pages (for instance, your company name and the main topic of the page) are in HTML text.\n- Make sure the words that searchers will use to look for you are on the page.\n\nKeep the site crawlable\n-----------------------\n\n- If possible, [avoid frames](https://www.google.com/support/webmasters/bin/answer.py?answer=34445). Frame-based sites don't allow for unique URLs for each page, which makes indexing each page separately problematic.\n- Ensure the server returns a `404` status code for pages that aren't found. Some servers are [configured to return a `200` status code](/search/docs/crawling-indexing/http-network-errors#soft-404-errors), particularly with custom error messages and this can result in search engines spending time crawling and indexing non-existent pages rather than the valid pages of the site.\n- Avoid infinite crawls. For instance, if your site has an infinite calendar, add a [`nofollow` attribute](/search/docs/crawling-indexing/qualify-outbound-links) to links to dynamically-created future calendar pages. Each search engine may interpret the `nofollow` attribute differently, so check with the help documentation for each. Alternatively, you could use the [`nofollow` `meta` tag](/search/docs/crawling-indexing/robots-meta-tag#nofollow) to ensure that search engine spiders don't crawl any outgoing links on a page, or use robots.txt to prevent search engines from crawling URLs that can lead to infinite loops.\n- If your site uses session IDs or cookies, ensure those are not required for crawling.\n- If your site is dynamic, avoid using excessive parameters and use friendly URLs when you can. Some content management systems enable you to rewrite URLs to friendly versions.\n\n\nSee our\n[tips for creating a Google-friendly site](/search/docs/fundamentals/seo-starter-guide)\nand [webmaster guidelines](/search/docs/essentials) for more information on designing\nyour site for maximum crawlability and usability.\n\n\nIf you will be at SES London, I'd love for you to come by and hear more. And check out the other\nGooglers' sessions too:\n\nTuesday, February 13th\n----------------------\n\n\n[Auditing Paid Listings and Clickfraud Issues](https://www.searchenginestrategies.com/sew/london07/agenda.html#aplppc);\n10:45 - 12:00; Shuman Ghosemajumder, Business Product Manager for Trust and Safety\n\nWednesday, February 14th\n------------------------\n\n\n[A Keynote Conversation](https://www.searchenginestrategies.com/sew/london07/agenda2.html#keynote);\n9:00 - 9:45; Matt Cutts, Software Engineer \n\n[Successful Site Architecture](https://www.searchenginestrategies.com/sew/london07/agenda2.html#ssa);\n10:30 - 11:45; Vanessa Fox, Product Manager, Webmaster Central \n\n[Google University](https://services.google.com/events/ses_london07);\n12:45 - 1:45 \n\n[Converting Visitors into Buyers](https://www.searchenginestrategies.com/sew/london07/agenda2.html#cvib);\n2:45 - 4:00; Brian Clifton, Head of Web Analytics, Google Europe \n\n[Search Advertising Forum](https://www.searchenginestrategies.com/sew/london07/agenda2.html#saf);\n4:30 - 5:45; David Thacker, Senior Product Manager \n\nThursday, February 15th\n-----------------------\n\n\n[Meet the Crawlers](https://www.searchenginestrategies.com/sew/london07/agenda3.html#mtc);\n9:00 - 10:15; Dan Crow, Product Manager \n\n[Web Analytics and Measuring Successful Overview](https://www.searchenginestrategies.com/sew/london07/agenda3.html#wamso);\n1:15 - 2:30; Brian Clifton, Head of Web Analytics, Google Europe \n\n[Search Advertising Clinic](https://www.searchenginestrategies.com/sew/london07/agenda3.html#sac);\n1:15 - 2:30; Will Ashton, Retail Account Strategist \n\n[Site Clinic](https://www.searchenginestrategies.com/sew/london07/agenda3.html#siteclinic_2);\n3:00 - 4:15; Sandeepan Banerjee, Sr. Product Manager, Indexing"]]