Friday, September 10, 2004

The Economics of XHTML

Does XHTML make sense for your business? In this article, I try to find out if XHTML reduces losses, increases profits, and gets more business for websites. Read the complete post...
We all know that XHTML is way cooler than HTML. I personally prefer anything that has an X on the name. Three X’s are even better, but one X will do just fine. It’s really cool.

But how good is “cool” for businesses? Can XHTML increase turnover? Can it reduce costs? Can it get you more business? Will your customers like you to change over? How much will you spend on it in the future on maintenance or upgrades? In a world where time is money, can XHTML based development save you time? In this article, I take a look at the business aspects of XHTML, and try to evaluate if it is just “cool” or if it really makes your pocket heavier. Also, hard as it is, I’ll try not to be biased towards XHTML along the way.

I do not talk about CSS in this article, except where it is absolutely necessary. I’ll cover the cost benefits of CSS in detail in a later post. This post is a comparison of the costs of using different markup technologies and doesn’t delve into styling or other such aspects of web designing.

In this article, XHTML would be treated similar to semantic HTML. So, you should be able to get identical results using semantic HTML.

Cost of hosting

XHTML can reduce the code length of your web pages by a substantial amount. For example, the Adaptive Path website redesign done by Doug Bowman using XHTML and CSS achieved a file-size reduction of 56% (26 kb) in the home page. In a test setup, he also achieved a 62% reduction on a redesign for the Microsoft homepage. In the recent ESPN redesign to embrace XHTML, an estimated saving of 50 kb per page was achieved. Wired News had a saving of around 64% on their site’s redesign.

When hosting a website the costs are split up into two components – cost for the server space and cost for the bandwidth. Since XHTML reduces file sizes, both costs for space and bandwidth come down. Doug has shown that with the Microsoft redesign a saving of 924 GB of bandwidth per day, or 329 Terabytes per year is achievable. Eric Mayer points out that the bandwidth savings of ESPN is around 730 Terabytes per year. When every byte saved is money saved, these numbers should definitely turn heads.

Arguably, Microsoft and ESPN are both huge websites, with a very high amount of hits per day. Smaller sites will not be saving so much. The cost of implementing XHTML might not be justified by the bandwidth savings alone for smaller sites.

Conclusion: XHTML will give a definite reduction in amount of space used on the server, which translates to money saved. Large sites will also benefit from the bandwidth savings, though this might be insignificant for smaller sites.

Cost of development from scratch

The consideration here is about which technology to start with. Considering that HTML is dead as we know it, there’s really no reason to think otherwise. However, I’d like to put in some numbers here too.

Developing using traditional HTML would mean that you’d have to have multiple versions of the same page for different browsers. The extent of differences would vary depending on the nature and design of the site. This is very important for cross-browser compatibility. Also, if additional browsers have to be supported, additional versions of the site have to be prepared. Want to support a PDA or other handhelds? Make another version of the site. Need cell phone-browser support? Make another version.

The cost of making each version of the markup increases linearly. So, if you want to make two versions, the cost of the HTML development will be double (or close to it). Three versions will cost thrice as much.

However, with XHTML, you need to make only one version of your site. Instantly, your site is accessible to a host of devices and browsers. The web-site can be viewed with any browser with exactly the same page, irrespective of the browser, manufacturer or platform. Depending on how many browsers you want to support, this translates into huge cost and time savings.

To put in some numbers, lets say that making a webpage work on Internet Explorer 5 on Windows costs you an amount X. Additionally, making it work on IE 4 on windows will cost you roughly 2X (maybe slightly less). Say you want to support IE 4, 5, 6 on Windows and Mac, that’s already 6X. And that is only 70% of the users on the Net. The remaining 30% of the users, who are just as likely to get you business as the first 70% use more obscure browsers like Mozilla Firefox, Netscape, Opera, Safari, or even unheard-of browsers. Supporting them all will only keep mounting up costs. With XHTML on the other hand, you’d only spend the original amount X. (Ok, the 6X is probably an over-estimate for a lot of projects. But you’d have to agree that it is still a large multiple of the cost of development in XHTML.)

Conclusion: If your use of HTML would mean that you’d have to maintain multiple versions of the site, XHTML linearly saves you both time and money depending on the number of versions you have to support.

Cost of upgrading an existing site

Upgrading to XHTML is usually difficult to justify in terms of ROI and pay-back periods. Most HTML site owners are generally happy with their sites, and don’t see enough returns in moving to XHTML. They are either unaware of or comfortable with the drawbacks of their site, and can’t justify the additional expense. However, there’s one consideration that cannot be overlooked.

XHTML is both backwards (for all practical purposes) and forwards compatible. HTML is only backwards compatible. Forward compatibility is where new technologies and browsers would still work with your site. Backward compatibility ensures that older technologies would be supported by your site. Up until now, web development has tried to maintain good backward compatibility with older browsers and devices. Besides, backward compatibility in HTML was implemented by maintaining different versions of the site. With XHTML, your site will be available to virtually any device that can read XML. Such devices might not even be envisioned yet. Some alternate-browsers that understand XML sound like they are right out of a science fiction flick. These browsers might read your web-page to you loud, or provide a Braille interface for the visually challenged. XHTML is ready for these browsers already. HTML is not.

This saving in money is more abstract, and one cannot predict a timeline for the ROI. However, this is definitely a saving in long-term efforts.

Conclusion: XHTML will definitely save you money in the long run. However, putting down a number for ROI or payback period will be difficult. The real reason to shift to XHTML becomes a technical reason more than an economic one, for the short term.

Cost of maintenance

Periodically, changes will have to be made to the site. The cost of these changes cannot be overlooked.

HTML files are rather complex. They have so much irrelevant information, it becomes difficult to manage them. XHTML on the other hand, if well planned, will be much easier to manage in the long run due to sheer simplicity of the files. Making changes to the site in the future will hence be very fast and consequently much less expensive.

However very few sites will need markup changes frequently. Updates to a site are generally handled using some server-side code. Even frequent updates usually does not translate into modifying markup.

That said, no site is ever prefect. Any good site will have a way to listen to their users and learn from them and accordingly change to serve them better. This process might be automated, manual, or a combination of both. Generally speaking, the process of getting feedback is best automated and real-time, and the process of altering the web-page is best handled manually. Obviously, the bottleneck here is the manual modifications, and easing this becomes very important. So, if you want to make a good site as opposed to one that just provides an online presence, XHTML would definitely help in the long run. Again, as above, putting numbers down to justify this will be difficult.

Conclusion: Most sites need frequent updates or modifications. The ease of handling XHTML as opposed to HTML in these scenarios cannot be overstated. However, again, since this depends on the Internet Business Plan of the company, quantifying this in terms of amount saved will be difficult.

Attracting more business

HTML, as said before, works only in some browsers the way it does. XHTML works on all kinds of browsers, on all kinds of platforms, with all kinds of devices. The market reach of HTML is around 70% and dwindling fast. The market reach of XHTML is around 99.5%, and getting stronger. HTML can support only a limited brand of desktop browsers running on certain operating systems. XHTML can work on devices we haven’t begun to imagine so far, while working just as fine with some of the most ancient setups.

Conclusion: XHTML will increase your market reach by about 30%. Additionally, it will continue to support the largest variety of users coming to the site, now and in the future. HTML has already failed to do this. XHTML will get your more business. Period.

Using best practices

Using XHTML will ensure that your site is compliant with some of the most stringent laws and practices so that your site remains accessible, future-proof, and contributes to the general well-being and goodness of the Internet as a whole. HTML has a lot of trash in it, and chaos attracts more chaos. XHTML is simple, compliant with complex laws and requirements, and helps in making the Internet more cozy and comfortable for everyone.

Though the business implications of this are not immediately visible, one only needs to scratch the surface to see how it helps. A site that is compliant with stringent laws and uses best practices is very easy to use and is highly accessible. Such sites are a pleasure to surf, since the user is always in control. Users will only come back to your site if they had a good experience when they were there the first time. Their every doubt has to be answered and every whim and fancy considered. As much as markup languages can help, XHTML renders clarity and ease of use and puts the user in control, irrespective of his technical limitations. HTML does a poor job of this. A user is more likely to come back to a XHTML site than an HTML site, just because XHTML is more comfortable to use.

Conclusion: XHTML will ensure repeat customers and sustained business in the business in a way that HTML will not be able to handle.

Overall conclusion

For completeness, here’s the list of conclusions.
  • Use of XHTML translates to savings when hosting your site. Mileage might vary.
  • XHTML will give you substantial savings if you were considering investing in HTML.
  • XHTML will give you some amount of savings if you upgrade your current site from HTML. Mileage might vary.
  • XHTML will expose your company to a larger market than is possible with HTML, now and in the future.
  • XHTML makes your user happy with your service. Your user leaves happy.
To sum it up, XHTML is good for your business. Enough said.

This is in no way a complete list of the business gains by using XHTML. And this is definitely not a list of the techinical benefits of using XHTML. This is only some of the more important factors that businesses need to consider when they are spending on their website.

I am sure I have left out many points. Please leave me comments to fill in the gaps. Also, please let me know if I have gone wrong with my assessment. I hope this article will help developers convince businesses to invest – something that most developers find difficult. I also hope this article will make for a good reference to give to businesses to understand why going XHTML is a smart move.



Rakesh Pai said...

You are right, Anne. However, I didn't find compelling reasons to convince people to use XHTML over HTML strict in a way that would make sense for their business. But you are the markup expert. How will semantics or non-presentational markup translate to business gains? Have I overlooked anything? What do you think?

Jonathan Snook said...

Your article makes sense if talking about traditional development methods for HTML (table layouts, font tags, etc) but you negate that by saying "XHTML would be treated similar to semantic HTML". When you said this I heard, "I will talk about how XHTML can save you money by simply using the XHTML syntax instead of the HTML syntax." Did I misunderstand?

Therefore, if syntax were the only difference, most of your arguments don't apply.

Cost of hosting: XHTML would actually incur a slightly higher cost due to the requirement to close all tags.

Cost of development: Assuming you were an expert at browser issues in both XHTML and HTML, this should technically be the same. Supporting multiple browsers shouldn't mean creating multiple versions for either HTML or XHTML.

Cost of Upgrading: If you come from an HTML background, you'll likely find yourself running after browser issues that you haven't had to deal with before (eg: image spacing inside a table [we'll assume it's a data table :)]). However, the bigger question here is backwards compatibility of the browsers vs the forward compatibility of the languages. Any browser that continues to support XHTML in its future versions will undoubtedly support HTML. Where I think the advantage comes in for XHTML is content re-use in an XML environment. Then HTML becomes dead in the water. To add to this, though, there would still be a need to transform the XHTML syntax into one that makes sense for this new environment. This development time would be considerably less than converting the HTML content over entirely.

Attracting more business: The argument here is the same as the cost of upgrading.

Using best practices: As Anne said, you may spend more time trying to maintain XHTML's strictness but in the end, it does help future-proof things.

I hope I didn't come across too harsh... I fully support XHTML and believe in it. My recommendation to you would be to reformulate your argument. Possibly expand on the content re-use advantages of XHTML. And include examples to help demonstrate your points.

Best of luck!

Rakesh Pai said...

Firstly, thank you Anne and Jonathan for having taken the time to post your views. It goes a long way in furthering the discussions and building on more knowledge in the process.

Anne, it’s an interesting point you bring up. I never thought of it that way. I guess I’ll have to plan something in my content-negotiation scripts to handle it, just in case things go wrong with the markup.

Jonathan, I agree totally with what you say. However you got me wrong. I guess I need to clarify.

I wrote this article so that we can compare the cost involved in having web-pages in XHTML or semantic HTML as against traditional table-based design. Tables don’t really lend to semantics. I’d thought that I had put that point across, albeit subtly. I thought I had further strengthened that by mentioning that CSS needs treatment too, but that I’ll handle it separately.

I wrote this post so we can have some material to show to businesses to convince them to spend on XHTML (or semantic HTML) markup. I did not want to get into technical details or jargon-talk as I thought I’d try to talk about numbers – something that businesses like to hear.

Everything you said is correct if you use table based design in XHTML. And for those very reasons and more, that is not a good idea at all. Your comments shed light beautifully on some of the pitfalls of programming in XHTML without understanding document semantics properly and by using traditional table based design and spacer GIFs. But it is exactly sentences like that last one that I wanted to avoid in the post.

As for content reuse, I have not been able to build a convincing case for businesses to move from HTML to XHTML. I couldn’t find any argument justifying itself in terms of ROI over a short term. Maybe someone can help with this area of the calculations?

Anonymous said...

Actually, this has little to do with XHTML or HTML.
What we can talk is separation content from presentation. Having less extra markup for content, cached presentational files - that gives bandwith savings and better maintanenece options.
Nobody forbids from having bloated and perfectly valid
HTML or XHTML (and even delivered with a right MIME type) which weights few times more than it should.


Anonymous said...

You may also wish to point out that by separating the content into it's own layer, you can access it from other places. For example, the content can be pulled to be used in a web page, or a flash document can grab it. After that, an ASP/PHP/WhatEver script can parse it for relevant bits. There are other applications, I am sure.

The important part is that all this data comes from *one* data source - no need to maintain sources for each separate publishing environment.

- Jeff

Anonymous said...

This might have been a good article if you hadn't used the term "HTML" to mean "Presentation and non-semantic markup" and "XHTML" to mean "Semantic and structural markup".

All the benefits you claim "XHTML" gives are achievable with HTML, and all the problems with "HTML" are just as possible with XHTML.

Your one line "In this article, XHTML would be treated similar ..." is unclear and not helpful.

The net result is that your article is misleading.

Anonymous said...

I don't get it: How can you talk about the benefits of semantic and structural markup without talking about CSS. The savings of bandwidth and disk space are not achieved just by using XHTML but by sperating structure and design. If haven't seen one single XHTML valid website without CSS.

- It just doesn't make any sense to use pure XHTML. Imaging a site with perfect structure and no design (CSS) at all: Where is the benefit?

Another example: Imaging one knowning nothing about CSS. He/she would think XHTML is the key to success. But it's just one aspect of the whole show...

Good article - no doubt - but you are missing one essential part: CSS. The hint for another upcoming article isn't helpful.

Rakesh Pai said...

Spot on, Jeff.

However, considering data storage mechanisms currently, there's no real reason to put data in an XHTML file instead of a database. Databases are simply easier to query from ASP/PHP or any other scripting/programming languages, if nothing else. Besides, most businesses already have their data in a database.

Don't get me wrong - structure of data is a very significant feature of XHTML. I just didn't see avenues to justify spending on these features yet.

Can someone prove me wrong? We'll have even more convincing business reasons to shift to XHTML, that way.

Rakesh Pai said...

I guess I am going to get a lot more of this, so let me clarify again.

This post is about semantic markup vis-a-vis markup for table based design. That's what I meant when I said that XHTML will be treated as being similar to semantic HTML.

This dicussion is obviously not complete without mentioning CSS. I'll talk at length about it at length soon.

Anonymous said...

Hmm, nice article about semantics.
But I don't get the point how this could covince someone to use semantic XHTML instead of semantic HTML?

Christoph Wagner

markku said...

Nice and simple article, without being too technical about everything. A nice link to send to clients who are clueless about XHTML versus HTML.

Good work!

Rakesh Pai said...

It doesn't Christoph. On the contrary, Anne's comments above give you enough reason to stop sending content as XHTML, at the moment atleast. However, semantics are still important, XHTML or otherwise.

I think my strategy would be to design as XHTML, and send to the browser as text/html. For the moment, at least. I was already doing this for IE browsers, but I'll just do it for everyone now.

Anonymous said...

Well, there's one problem with your numbers. ESPN originally stated that they saved a ton of bandwidth, but it turns out they didn't see the dramatic savings they predicted. The reason why actually made them happier than if they had saved bandwidth/money. The pages were loading so much faster that readers were viewing more pages than they used to. This is a known as A Very Good Thing.

John said...

I appreciate the article but I'm a little baffled about leaving CSS out of it. Table layouts are ostensibly replaced by CSS, as are font tags (insert "content layer vs. presentation layer" dogma here).

The reason you save so much in bandwidth is by caching the stylesheet and stripping all the presentational markup, i.e. the tables and font tags. To me this is a major benefit and only supports the business case for using XHTML/CSS. I doubt anyone wants a site with perfectly-coded XHTML and no CSS, at least none of my clients do, not even on web-based applications where there is no need for a lot of flashy design techniques.

Other than that, good article. I agree that you have to talk numbers to businesses.

P.S. I love the "Anonymous" quote about ESPN, above.

Anonymous said...

1. XHTML doesn't reduce file sizes, if anything it increases them. The author is obviously getting confused between XHTML, tableless design, CSS and the separation of content from presentation.

2. HTML isn't dead. It's a completely valid standard and will actually produce lower file sizes than the equivalent XHTML.

3. Why on earth would you need to make multiple versions of your site if you used HTML? Again the author is getting really confused here between valid HTML and proprietary code.

4. Upgrading from HTML to XHTML is usually as simple as adding is some closing tags and changing the doctype. Of course, actually serving it as XML is a different matter. Many people would argue that it's better to use box standard HTML than XHTML served as Text/HTML.

5. HTML is forward compatible because it's a valid standards and all future browsers *should* honour valid standards.

6. HTML files are no more complex than XHTML files. If anything they are slightly less complex.

7. The market reach of HTML is around 70%? XHTML will increase your market reach by about 30%? What a load of rubbish.

8. XHML has nothing to do with accessibility or 508.

Overall conclusion

You really don't know what you're talking about.

Anonymous said...

"Many people would argue that it's better to use box standard HTML than XHTML served as Text/HTML."
Yes, and I personally do.

I didn't read the entire original article, but did for the whole comments. So here's what I have to say:
After I knew that IE7 wasn't supporting the application/xhtml+xml I dropped XHTML. Content snifing and all that pain isn't good.

I for now make webpages/sites with the latest HTML doctype which is "4.01 Strict". And for the info, the HTML 4.01 upgrade from 4.0 was accomplished in a way so that the XHTML 1.0 would be based on the latest HTML and therefore be largely reduced; it's said somewhere in the XHTML 1.0 spec.

What does this mean?
It means that serving XHTML as text/html is very bad (and I agree with Hixie here), since you should stick with those silly "HTML browsers compatibility guidelines" by adding a space like this: <img /> instead of <img/> and a lot more. Like Hixie says, this is dangerous, because it is only a hack, and real conformant HTML user-agent who parse that wil just show ">" or "/" characters or w/e all over the page.
And this without talking about the DOM, CSS and JS differences between HTML and xHTML, even if they're small.

HTML is the future. Because browsers do well on HTML. Sending XHTML as text/html is stupid and useless.

XHTML ,not only permits reuse of the data it contains, on XML env. nope.
XHTML allows the XML namespaces mix, like adding the <svg>, <math> tag etc. (modules, ruby for instance in xHTML 1.1)
This is the real advantage.
And since all websites do not YET do this, and after all CANNOT cause their file is being an text/html (and also won't benefit from XML parsing - with that error), so one should stick with HTML 4.01 Strict.

And then it's each one's responsibility to make semantic data or not. After all it will benefit him first, so why should we care for other person's stuff?

And yeah, since IE doesn't include support for application/xhtml+xml, than this is another plus for throwing XHTML.

So HTML or XHTML it's the same thing when refering to markup. xHTML 1.0 doctypes are the same as HTML 4.01's.
It is XHTML 2.0 that will make all the difference (cause if you go check the draft you will see that the semantics are way too much perfectionned and that is an axcellent thing). So in that time, we can talk about real world xHTML benefits -- and by the time that spec comes out, I guess MS would have done better with their browser.

So, using valid, well formed, semantic, all that stuff with HTML 4.01 is the best thing to do IMHO. Browsers so far are good at HTML.

Yahia Chlyeh

Anonymous said...

Since further development of XHTML has been dropped ( in favor of HTML5 - It puzzles me why I continue to see postings, articles and books push XHTML as "The Future" and how XHTML is supposed to create leaner Web sites.

Currently HTML5 appears to be a mess, but IT is the future. We need to promote the idea of dropping table based invalid web sites in favor of using VALID HTML 4.01 Strict or HTML5 (once it's finished) and CSS.

If businesses want to save money on their Web sites - that's the way to go.

Rakesh Pai said...

This post is from 2004. In any case, the title is wrong, as I've clarified in the post. This is about semantic markup, not about XHTML itself.

Besides, XHTML is not dead at all. XHTML 2.0 is dead. XHTML continues to live on, and will continue to live on even with HTML5's semantics. It's a common misconception that XHTML is dead.