A Brief History of HTML

It's fashionable, of course, to add certain letters before words and acronyms now. The last ten years of computing seem like they wouldn't have been possible without the letters i-, e-, and x-something. But in the case of XHTML, the X comes from "extensible": eXtensible HyperText Markup Language—it's functional and descriptive (and it does sound kind of sexy, too).

But before unpacking the X and the concept of "extensible," it's important to know what happened to HTML.

So What Happened to HTML?

The history of HTML is nothing if not a lesson in the destruction greed can inflict on a system of human communication. A system of communication that allowed for the easy referencing (through URLs) and sharing (through an openly accessible Web and an easy-to-write open language, HTML) is what Tim Berners-Lee, inventor of the World Wide Web, had intended. Berners-Lee wrote the very first Web browser in fall of 1990, along with the first version of HTML. And for a few years, the Web developed quietly on a few college campuses.

But then a company called Netscape began distributing a free Web browser around 1994. The idea of giving away a program for free was quite revolutionary at the time. (Netscape actually made their money off of server software.) But it was not long, approximately a year, before software giant Microsoft released their Internet Explorer (IE) browser, which would later be bundled with the Windows operating system. Microsoft also pursued aggressive licensing for IE with internet service providers like America Online (AOL).

By latter part of the 1990s, what is now referred to "the browser wars" was in full swing. To try and keep dominance over the market, Netscape and Internet Explorer were in a race to include as many features as possible in their respective browsers (the security problems of which Web developers are still confronting and patching today). But neither company, apparently, believed a features-based competition was the only road to dominance.

So they turned to the HyperText Markup Language, HTML, in hopes of better ensuring market share. Each browser developed a different markup style; for example, Netscape tags were written in all capital letters. And both companies began adding proprietary tags to the language, tags which only their browser could read (though Microsoft was much better at keeping up with the Netscape tags). Microsoft also released its feature-laden and relatively cheap Web editor, FrontPage, which was (and is, as of this writing) non-standard by design (more about what that means later).

By the year 2002, the Browser Wars were over with, and Microsoft Internet Explorer emerged as the clear victor. But during that seven-year period, HTML had become a mangled mess; taglines like "This Site Best Viewed in Internet Explorer 5.0" were commonplace on web sites everywhere.

The World Wide Web Consortium

Tim Berners-Lee had actually helped in 1994 to found the World Wide Web Consortium or W3C (he continues to serve as the Consortium's Director). The mission of the W3C is stated in one sentence: "To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web" (http://www.w3.org/Consortium/).

One of the central activities of the W3C, which is staffed by people from corporations and educational institutions, is the publication of "Recommendations," also known as standards. It's a solid lesson in the importance of rhetoric to an organization to see how those "recommendations" were largely ignored by Netscape and Microsoft during the Browser Wars. In a cut-throat battle for dominance, a "recommendation" isn't going to carry much weight.

The consequence, though, of ignoring open standards—both for people who create texts for the Web, and for people/companies that create the browsers and devices ("user agents") that read them—is the stunting of the Web's fabled promise of information sharing, regardless of equipment or software.

HTML: Presentation, Structure, Both?

In a certain sense, HTML was doomed from the beginning, as it combined both structural tags (tags that inform the user agent that "this is a paragraph," or "this is a link") with presentation properties ("make the background purple," "make this text really big"). The Browser Wars only served to bloat HTML even further, so that it was (and is) not unheard of for someone accessing information on the Web to spend more time downloading the markup for the page than the content on the page itself (which is all the poor user wanted in the first place).

The problem with this is that display information is only helpful to human beings (and sighted human beings, at that), whereas structural or semantic information is mostly useful only to the actual devices that have to read or "parse" the code.

Saved by XML (well, sort of...)

In early 1998, the W3C released the first recommendation for XML: Extensible Markup Language. The revolution of XML was that it promised to allow information sharing between different pieces of software; human beings' visual needs were not written into the story of XML the way they were with display-property-happy HTML.

In order to accomplish this, though, the standard had to be strict and followed quite closely, both by the authors of XML documents and the software programs that handled XML; otherwise, important banking data in a database in New York would not transfer properly to the desktop computer of an accountant in Wichita.

Web browsers are, after all, some very forgiving pieces of software. If they weren't, most of the Web today wouldn't be viewable. You can shoot the most awful, broken HTML code you'd like at a Web browser, and it will try and do something with it. The same is not true for XML parsers; break a rule in XML, and the software program reading it will generate an error—and that error message is all you'll get.

Enter XHTML

So in January of 2000, the W3C released the first specification of XHTML, otherwise known as XHTML 1.0 (the last specification for HTML was 4.01, released in December of 1999; there will never be another).