XML and content management systems

facebooktwittergoogle_pluslinkedinmail

With the rise in popularity of both XML and content management systems (CMS), there are an increasing number of tenders specifying “CMS must be built using XML”.

What does this mean in practice? This article explores the role of XML in the context of content management systems, focusing specifically on the business issues.

This is a complex area, and one that is evolving rapidly. This article does not aim to provide a complete answer to all questions. Rather, the goal is to ‘demystify’ the area, and provide organisations with more information on which to base their CMS selection processes and criteria.

What is XML?

The eXtensible Markup Language (XML) is an industry standard for capturing and communicating information in a structured way. In essence, it is a “language for creating languages”, where each language is designed to represent a specific information set in an effective and powerful way.

In the context of this article, it is important to recognise that, while XML provides many benefits, it is not a “silver bullet”. While two products may use XML to represent the content in their CMS, if they aren’t using the same dialect, there is no interoperability.

This issue will be revisited a number of times throughout the article, and it represents the key challenge facing XML in the context of content management systems.

For more on XML, see the supplement published by Standards Australia titled An Introduction to XML for Knowledge Managers.

The use of XML in content management systems is still rapidly evolving

XML and CMS

There are a number of key areas in a content management system where XML has a role to play, including:

  • authoring
  • communication
  • interoperability
  • storage
  • publishing

These will be explored in the following sections, starting with the simplest aspects, and working through to the more complex issues.

Communication

There are a number of widely-used ‘syndication’ formats, designed to support the flow of information between organisations. These include Rich Site Summary (RSS) and NewsML, and all are based on XML.

With the growth of these formats, many content management systems provide support as part of their core product. Overall, there should be a decreasing need to customise CMS solutions to handle these syndication formats.

Recommendation

If your organisation has a business requirement to either accept outside ‘news feeds’, or to publish information for the use of others, specify this in your CMS tender.

It is also reasonable to specify which specific syndication formats the CMS should support.

Single-source publishing is one of greatest potential benefits

Publishing

The use of the XML Stylesheet Language (XSL) grown extensively since it was released several years ago.

With the increasing maturity of this technology, and the widespread availability of programmers skilled in its use, there are potential benefits to using XSL as the basis for CMS publishing systems.

There are, however, a number of other non-XML publishing technologies which have similar (or better) capabilities.

XML has the potential to offer real benefits in providing ‘single source publishing’. By separating the presentation from the content, the same piece of information can be rendered differently depending on how it will be used (eg. on the web, wireless, print, etc).

To enable this capability, the content must be captured in a structured way (see the section on authoring).

Recommendation

If your organisation has a commitment to an XML-based publishing environment, or has in-house programmers skilled in XSL, then specify this in your CMS tender.

If there is a need to support multiple published formats, then XML may also be relevant.

Beyond this, the use of XML in publishing systems should not be a major consideration when selecting a CMS.

Storage

To a large extent, what happens within a content management system is invisible to the business. As a “black box” solution, a CMS is generally free to be coded in any way that meets business requirements.

In principle, there would be benefits to using a standard XML language to capture site content and structure. Unfortunately, at present, there are no standards in this space.

Without these standards, while two CMS products may use XML to store their content, their use of different XML dialects will eliminate most benefits in terms of migration or interoperability.

Recommendation

At present, vendors should implement XML within their CMS if it produces concrete business benefits, or supports additional functionality. This is a technical implementation decision.

For customers of a CMS, there is little benefit in specifying that a CMS should use XML internally, unless there are clear business reasons.

When XML standards are developed for representing site content and structure, this situation will change.

Interoperability

A content management system is but one of a number of information systems that exist within most organisations.

Increasingly, there is a recognised need to connect these systems together, to provide a seamless information environment.

In practice, this often means connecting a CMS to systems such as:

  • document management systems
  • records management systems
  • e-commerce platforms

In this area, substantial work has been devoted to the development of ‘web services’ platforms, such as Sun’s J2EE or Microsoft’s .NET.

While there is considerable impetus behind these initiatives, these are very low-level standards designed to facilitate web-based communication.

Little has been done to directly address the needs of content management, and there are no implemented standards that manage the content or structure of a site.

With this being the case, it is currently more important that the CMS solution provide a documented API (application programming interface) than to offer XML capabilities. This API can then be used to develop customised interoperability code (which may use web services).

Interoperability is limited by the lack of standards

Recommendation

Determine the specific ways in which the CMS will need to be interconnected with other information systems within your organisation.

Explore how prospective CMS solutions would meet these interoperability needs, and focus on the provision of a fully-documented API.

If your organisation has a widely-deployed web services platform (such as .NET or J2EE), the CMS should be able to interact with this.

Content must be captured in a structured form for XML benefits to be realised

Authoring

To realise many of the benefits offered by XML in a CMS, content must be captured in a structured way.

This means moving away from unstructured information sources, to an environment where the content is authored in a more controlled way.

This is a complex and evolving area, but a few key realisations have become apparent:

  • Organisations must move away from using tools such as word processors to author content, as there is no automated way to convert unstructured sources to XML.
  • There are benefits to be gained by having authors use an XML-aware editor as part of a CMS.
  • The users must not be exposed to the complexity of XML. The fact that the content is stored as XML should be invisible to authors and editors.
  • Capturing content as HTML is not desirable, as this severely limits the potential benefits delivered by XML elsewhere in the CMS.
  • While some systems claim XML-compliance through the use of XHTML (the XML version of HTML), this provides no practical benefits over using plain HTML.

A number of vendors have developed solutions around the use of XML-based editing environments, and these show promise. There is, however, no consistent approach, and most XML-based solutions are relatively immature.

Progress has been slow in this area, and it is expected that several more years of practical experience will be required before the best business solution becomes apparent.

Recommendation

When assessing potential CMS products, explore how the use of an XML-based authoring environment will provide concrete business benefits.

Focus on the usability and simplicity of the authoring tools, and ensure that the authors will not have to replace their knowledge of HTML’s intricacies with a technical understanding of XML.

Use scenarios as part of the selection process to determine how the use of XML will work in practice, and what benefits and costs it will bring.

Focus on business requirements first, XML second

Overall recommendations

This article has only explored the most common uses of XML in a content management system.

In general, organisations should determine their specific business requirements, and list these in the CMS tender. If there are specific requirements for XML capabilities, these should be outlined in detail.

Without the further development of XML standards specifically relating to content management, the use of XML to support interoperability is currently limited.

Overall, it is therefore not meaningful to specify that a CMS should “support XML”. At present, it is more important to select a product that meets all the organisation’s business requirements, than to choose a system that offers XML features.

James Robertson
James Robertson is the Managing Director of Step Two, the global thought leaders on intranets, headquartered in Sydney, Australia. James is the author of the best-selling books Essential intranets, Designing intranets and What every intranet team should know. He has keynoted conferences around the globe. (Follow him on Twitter or find him on Google+)