The data we collected for HTTP headers was mostly an afterthought and as such isn't very reliable. Here are some things we noticed, though:
text/htmldocuments without a
charsetparameter in the
Content-Typeheader outnumber those with such a parameter almost by a factor of two (despite the HTML4 spec saying that UAs
must not assume any default value for the "charset" parameter).
Documents with the
text/xmlMIME type outnumber documents with the
application/xmlMIME type by at least three to one (despite the fact that the former is discouraged by the XML standards community because of the rules for how to handle character sets with those MIME types).
There are only twice as many
text/plaindocuments out there than
application/msworddocuments (and that doesn't take into account the fact that
text/plainis the default MIME type of some servers while many
application/msworddocuments will end up labelled as something else).
A pretty significant number of pages include an
X-Pingbackheader (more than the number of pages with the
Set-Cookie2header). In fact,
X-Pingbackwas the 30th most-seen header in our data sample. The
X-Pingbackheader is part of Pingback, a blogging technology for tracking responses similar to Trackback.
There are pages that use the
Window-Targetheader, and even some that use the
Linkheader (though we haven't yet checked what for!). There are even some pages that include the