Web Authoring Statistics: Classes

How many different class names do pages use? Well, most pages apparently don't use the class attribute at all, and it's downhill from there:

Most pages have 0 class attributes, one class attribute is the next most common case, etc.

Which class names are used on the most pages? Here are the top 20:

footer, menu, title, small, text, content, headev, nav, copyright, button, main, search, msnnormal, date, smalltext, body, style1, top, white, link.

This actually maps very well to the elements that are being proposed in HTML5:

Popular Class HTML5 Element
footer footer
menu menu
title, header, top (?) header
small, smalltext small
text, content, main, body article
nav nav
copyright none yet
button working around an IE6 limitation
search none yet
date date
link ?

The rest of the top 20 classes are either presentational or otherwise meaningless (msonormal, for example, which is one of the classes that Microsoft Office uses in its "HTML" output). Of the top 20, the two classes that are used the most that are currently not covered by HTML5 are copyright and search.

The button class is apparently used to target input elements in CSS, because IE6 doesn't support attribute selectors.

The link class baffles us. We can't really tell what what it is used for. Why would authors label something with that class? Something like the button class maybe?

Beyond the top 20, many of the classes are of a presentational nature (clear, style2, bold...), and most of the values that don't fall into that bucket are synonyms for the top 20, like h1 and pageheading (presumably both used for multi-level headers, which is handled by <header> in HTML5), or class="post" (handled by <article> in HTML5).

There are some interesting values that are widely used, though. We found class="breadcrumb" in 34th position; should HTML be extended to support breadcrumb-style navigation in some way? Similarly, in 40th place we found class="price"; should a <price> element with, e.g., attributes for unambiguous specification of currency, be considered for future versions of HTML? These probably deserve a little more study.