Various people have, over the last few years, done studies into the popularity of authoring techniques. For example, looking at what HTML ids and classes are most common, and at how many sites validate (and yes, we know that we're not leading the way in terms of validation).
John Allsopp's study is the most recent one we're aware of, where he
id attribute values on
1315 sites. Before that, Marko Karppinen
did a study in 2002, looking at which of the then 141 W3C members
had sites that validated; in 2003 Evan Goer did
a study into 119 Alpha Geeks' use of XHTML; and of course in 2004
François Briatte did
a study covering trends of Web site design on 10 high-profile
blogs. In addition, in the last year, microformats.org contributors
have done a lot of research into the use of
rel attributes, amongst other things, in their pursuit
of bite-sized reusable semantics. We are also aware of some studies
being done by for the Mozilla
project, covering thousands of pages.
We can now add to this data. In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata. The results we found are available below. We hope this is of use!
The parser looked only at documents whose HTTP headers including a
Content-Type header with a value that started with the
The parser we used was really just a tokeniser. It just extracted
tags, ignoring comments; it didn't look at character data between
tags, nor at DOCTYPEs, PIs, or character entities, but it did support
skipping the contents of
CDATA blocks (though it did not attempt to execute the script
blocks). The tokeniser was actually a prototype we made to test the
parsing rules described in the WHATWG's HTML5 spec.
Note: You will need a browser with SVG and CSS support to view the result graphs correctly. We recommend Firefox 1.5.
Our analysis in December 2005 covered various topics:
- Pages and elements
- Elements and attributes
- Classes (
- HTTP headers
- Page headers: The
- Metadata: The
- Text elements
- Table elements
- Link relationships (
- Scripting: The
- Editors and their custom markup
We will probably perform other surveys in the future, to look for data that we didn't capture this time. For example, while we collected data to compare to John Allsopp's class frequencies, we did not collect data on popular ID attribute values. If you can think of anything worth examining in particular, please let us know.