Web Authoring Statistics


Various people have, over the last few years, done studies into the popularity of authoring techniques. For example, looking at what HTML ids and classes are most common, and at how many sites validate (and yes, we know that we're not leading the way in terms of validation).

John Allsopp's study is the most recent one we're aware of, where he looked at class and id attribute values on 1315 sites. Before that, Marko Karppinen did a study in 2002, looking at which of the then 141 W3C members had sites that validated; in 2003 Evan Goer did a study into 119 Alpha Geeks' use of XHTML; and of course in 2004 François Briatte did a study covering trends of Web site design on 10 high-profile blogs. In addition, in the last year, microformats.org contributors have done a lot of research into the use of class and rel attributes, amongst other things, in their pursuit of bite-sized reusable semantics. We are also aware of some studies being done by for the Mozilla project, covering thousands of pages.

We can now add to this data. In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata. The results we found are available below. We hope this is of use!


The parser looked only at documents whose HTTP headers including a Content-Type header with a value that started with the nine characters text/html.

The parser we used was really just a tokeniser. It just extracted tags, ignoring comments; it didn't look at character data between tags, nor at DOCTYPEs, PIs, or character entities, but it did support skipping the contents of script and style CDATA blocks (though it did not attempt to execute the script blocks). The tokeniser was actually a prototype we made to test the HTML parsing rules described in the WHATWG's HTML5 spec.


Note: You will need a browser with SVG and CSS support to view the result graphs correctly. We recommend Firefox 1.5.

Our analysis in December 2005 covered various topics:

Future study

We will probably perform other surveys in the future, to look for data that we didn't capture this time. For example, while we collected data to compare to John Allsopp's class frequencies, we did not collect data on popular ID attribute values. If you can think of anything worth examining in particular, please let us know.