July 02, 2003

XML and content management systems

I've released my latest KM Column article, this one exploring a very hot topic at the moment: XML and content management systems. To quote:

July KM Column: This article explores the role of XML in the context of content management systems, focusing specifically on the business issues.

Feedback appreciated...

Posted by jamesr on July 02, 2003 11:04 AM
Categories: Content management, James' articles, XML

Comments

Excellent article. I agree with most of what you say, but I'm not so sure about your section on authoring. Personally I believe that XML's single most valuable use within a CMS is for storing the actual content (not necessarily an XML database, but XML should be used to "mark up" documents stored within the system). Semantically structured XML documents are an incredibly valuable asset, as they can be easily converted to any other format and (more importantly) can be exported from the system and re-purposed relatively simply using existing XML tools. This is the critical point, as it helps avoid lock-in to a single CMS.

You state that XHTML has no benefits over HTML, and HTML is a very poor choice for an archive format. Here I must disagree: while HTML is definitely a bad idea due to the temptation to add useless presentational information (and the relative difficulty in parsing it later) XHTML is, in my opinion, an excellent choice for a document format. If we have decided we want to use an XML format for documents, we need to find (or create) an XML format with support for common document parts, such as paragraphs, headers, lists, titles and so forth. We want to avoid any presentational information as that reduces the value of our content. XHTML Strict fulfills these requirements, saving us the trouble of creating a new XML standard and potentially allowing us to use existing authoring tools (although in practise I have yet to find an XHTML authoring tool that I would trust to produce the strict structural markup required for a truly excellent CMS). Best of all, thanks to XML namespaces any "custom" tags required by the CMS can be embedded straight in to the XHTML documents.

Obviously there are some situations where XHTML would not be suitable (in which case something larger such as DocBook might be required) but for many CMS situations I think it provides a solid, ready made standard for storing documents.

Posted by: Simon Willison on July 2, 2003 11:37 AM


I agree that authoring is the key to XML's success in a CMS, and it's something that I've done a lot more thinking about than I could find space for in the article.

HTML is fine, as far as it goes. Unfortunately, what I see in practice is people using the MS RichEdit component to enter all sorts of horrible stuff, or pasting directly from Word documents.

While XHTML does offer some benefits, at the moment, it is unfortunately being used by some vendors to claim "XML-compliance" without adding any real benefits over HTML.

While I am generally optimistic that this will all get sorted out in time, I've been working on this specific problem for 6 years (!), and have seen little movement in terms of tools or technologies...

Cheers, James

Posted by: James Robertson on July 2, 2003 11:43 AM


I agree about the MSHTML component - it may be an easy way of adding rich text editing to a web application but the price in terms of unusable markup is incredibly high. The company I work for have experimented with using HTML-Tidy to clean it up, using a whole bunch of regular expressions to try and extract soem meaning from the markup, and have now moved on to using a custom built Flash component for editing which gives us more control over the markup that is generated. It's still not ideal by a long way though.

My ideal solution would be a desktop application (or a Java applet at a pinch) custom written for the creation of structural XHTML documents. I think we've taken the browser-based model about as far as it can go.

Posted by: Simon Willison on July 2, 2003 09:22 PM

Back to Main Page...
SYNDICATE [Column Two]
Powered by Movable Type 2.64