JULY 2003

XML and content management systems

Written by , published July 2nd, 2003

Categorised under: articles, content management

With the rise in popularity of both XML and content management systems (CMS), there are an increasing number of tenders specifying “CMS must be built using XML”.

What does this mean in practice? This article explores the role of XML in the context of content management systems, focusing specifically on the business issues.

This is a complex area, and one that is evolving rapidly. This article does not aim to provide a complete answer to all questions. Rather, the goal is to ‘demystify’ the area, and provide organisations with more information on which to base their CMS selection processes and criteria.

What is XML?

The eXtensible Markup Language (XML) is an industry standard for capturing and communicating information in a structured way. In essence, it is a “language for creating languages”, where each language is designed to represent a specific information set in an effective and powerful way.

In the context of this article, it is important to recognise that, while XML provides many benefits, it is not a “silver bullet”. While two products may use XML to represent the content in their CMS, if they aren’t using the same dialect, there is no interoperability.

This issue will be revisited a number of times throughout the article, and it represents the key challenge facing XML in the context of content management systems.

For more on XML, see the supplement published by Standards Australia titled An Introduction to XML for Knowledge Managers.

The use of XML in content management systems is still rapidly evolving


There are a number of key areas in a content management system where XML has a role to play, including:

  • authoring
  • communication
  • interoperability
  • storage
  • publishing

These will be explored in the following sections, starting with the simplest aspects, and working through to the more complex issues.


There are a number of widely-used ‘syndication’ formats, designed to support the flow of information between organisations. These include Rich Site Summary (RSS) and NewsML, and all are based on XML.

With the growth of these formats, many content management systems provide support as part of their core product. Overall, there should be a decreasing need to customise CMS solutions to handle these syndication formats.


If your organisation has a business requirement to either accept outside ‘news feeds’, or to publish information for the use of others, specify this in your CMS tender.

It is also reasonable to specify which specific syndication formats the CMS should support.

Single-source publishing is one of greatest potential benefits


The use of the XML Stylesheet Language (XSL) grown extensively since it was released several years ago.

With the increasing maturity of this technology, and the widespread availability of programmers skilled in its use, there are potential benefits to using XSL as the basis for CMS publishing systems.

There are, however, a number of other non-XML publishing technologies which have similar (or better) capabilities.

XML has the potential to offer real benefits in providing ‘single source publishing’. By separating the presentation from the content, the same piece of information can be rendered differently depending on how it will be used (eg. on the web, wireless, print, etc).

To enable this capability, the content must be captured in a structured way (see the section on authoring).


If your organisation has a commitment to an XML-based publishing environment, or has in-house programmers skilled in XSL, then specify this in your CMS tender.

If there is a need to support multiple published formats, then XML may also be relevant.

Beyond this, the use of XML in publishing systems should not be a major consideration when selecting a CMS.


To a large extent, what happens within a content management system is invisible to the business. As a “black box” solution, a CMS is generally free to be coded in any way that meets business requirements.

In principle, there would be benefits to using a standard XML language to capture site content and structure. Unfortunately, at present, there are no standards in this space.

Without these standards, while two CMS products may use XML to store their content, their use of different XML dialects will eliminate most benefits in terms of migration or interoperability.


At present, vendors should implement XML within their CMS if it produces concrete business benefits, or supports additional functionality. This is a technical implementation decision.

For customers of a CMS, there is little benefit in specifying that a CMS should use XML internally, unless there are clear business reasons.

When XML standards are developed for representing site content and structure, this situation will change.


A content management system is but one of a number of information systems that exist within most organisations.

Increasingly, there is a recognised need to connect these systems together, to provide a seamless information environment.

In practice, this often means connecting a CMS to systems such as:

  • document management systems
  • records management systems
  • e-commerce platforms

In this area, substantial work has been devoted to the development of ‘web services’ platforms, such as Sun’s J2EE or Microsoft’s .NET.

While there is considerable impetus behind these initiatives, these are very low-level standards designed to facilitate web-based communication.

Little has been done to directly address the needs of content management, and there are no implemented standards that manage the content or structure of a site.

With this being the case, it is currently more important that the CMS solution provide a documented API (application programming interface) than to offer XML capabilities. This API can then be used to develop customised interoperability code (which may use web services).

Interoperability is limited by the lack of standards


Determine the specific ways in which the CMS will need to be interconnected with other information systems within your organisation.

Explore how prospective CMS solutions would meet these interoperability needs, and focus on the provision of a fully-documented API.

If your organisation has a widely-deployed web services platform (such as .NET or J2EE), the CMS should be able to interact with this.

Content must be captured in a structured form for XML benefits to be realised


To realise many of the benefits offered by XML in a CMS, content must be captured in a structured way.

This means moving away from unstructured information sources, to an environment where the content is authored in a more controlled way.

This is a complex and evolving area, but a few key realisations have become apparent:

  • Organisations must move away from using tools such as word processors to author content, as there is no automated way to convert unstructured sources to XML.
  • There are benefits to be gained by having authors use an XML-aware editor as part of a CMS.
  • The users must not be exposed to the complexity of XML. The fact that the content is stored as XML should be invisible to authors and editors.
  • Capturing content as HTML is not desirable, as this severely limits the potential benefits delivered by XML elsewhere in the CMS.
  • While some systems claim XML-compliance through the use of XHTML (the XML version of HTML), this provides no practical benefits over using plain HTML.

A number of vendors have developed solutions around the use of XML-based editing environments, and these show promise. There is, however, no consistent approach, and most XML-based solutions are relatively immature.

Progress has been slow in this area, and it is expected that several more years of practical experience will be required before the best business solution becomes apparent.


When assessing potential CMS products, explore how the use of an XML-based authoring environment will provide concrete business benefits.

Focus on the usability and simplicity of the authoring tools, and ensure that the authors will not have to replace their knowledge of HTML’s intricacies with a technical understanding of XML.

Use scenarios as part of the selection process to determine how the use of XML will work in practice, and what benefits and costs it will bring.

Focus on business requirements first, XML second

Overall recommendations

This article has only explored the most common uses of XML in a content management system.

In general, organisations should determine their specific business requirements, and list these in the CMS tender. If there are specific requirements for XML capabilities, these should be outlined in detail.

Without the further development of XML standards specifically relating to content management, the use of XML to support interoperability is currently limited.

Overall, it is therefore not meaningful to specify that a CMS should “support XML”. At present, it is more important to select a product that meets all the organisation’s business requirements, than to choose a system that offers XML features.

Tags: ,

Don't miss out!

Every month we publish two new articles, just as good as the one you've just read. Join our monthly newsletter, and we'll notify you when new articles are published, along with other Step Two updates.

(We won't use your details for any other purposes, or provide it to third parties.)


  1. I like the idea of using XML to store content of a site and at the moment I am trying to create a PHP based CMS which uses XML. Everything has gone as planned so far except that I’ll be implementing a news section which will allows visitors to leave comments. This is where the problem pops up. I will be storing the content of all pages including the news posts in individual xml files so I am thinking if whether I should store the comments also in their respective pages or create seperate xml files to store them?

    If I create seperate files to store the comments, can you think of a way I can establish relations between them? Basically create a link between all comments with their associated post.

    — Naif Amoodi

  2. Your mileage may vary, but in my experience most CMS products tend to store their content in a database for performance reasons. That also helps with the relationships between pages, comments, etc.

    Of course, the content itself can be stored as XML within a suitable database field, and XML can be used as part of the publishing process if that is deemed valuable.

    Anyway, good luck with the new CMS!