While the goal of interoperability between content management systems (CMS) is an important one, it is limited by the lack of standards relating to content management. At present, there are a range of narrowly-focused specifications in the marketplace, but these address only specific aspects of system interoperability.
A number of initiatives are under way to address CMS interoperability, but these are in their formative stages, and it is expected that at least several years will be required before widely-accepted CMS standards are developed.
This article explores the need for interoperability, and outlines the current state of the relevant standards and technologies.
The need for interoperability
There is considerable interest in the subject of interoperability between content management systems, and there are several different aspects to this.
The first is interoperability between organisations. This is most commonly seen in the public sector, where there is a desire to share content between government agencies, even when each organisation has installed a different CMS product.
This interoperability between organisations is the focus of the article.
Another aspect is interoperability within organisations, between the CMS and other information systems. While this is also a key consideration, this will be explored in greater depth in future articles.
There is no complete solution for CMS interoperability
Multiple levels of interoperability
This article focuses on interoperability between different content management systems, which can be considered on a number of levels:
- System interconnection
The system interconnection layer consists of common infrastructure, communication protocols (eg TCP/IP, HTTP), and common security protocols. This is the lowest layer of interoperability.
- Data integration & interchange
Above the network layer, data interoperability is achieved through the use of common messaging protocols. Note that this layer requires ‘meaning’ to be added to data for it to be useful.
- Application integration (industry-specific)
Within individual industry areas, the meaning (semantics) of specific information types can be represented, and shared between systems.
- Application integration (content)
This is the most general, and most powerful, level of interoperability, with content management systems having the ability to directly share content, structure, and metadata between platforms and organisations.
At present, there are no widely-used interoperability standards that address the complete capabilities of a content management system. Instead, there are individual standards that are relevant to each of these levels, as outlined in the following sections.
Beyond supporting common communications protocols (such as TCP), the most well-established aspect of interoperability at the system level is the use of common directory and security services.
There are a number of industry standards designed to create a single enterprise repository for user and authentication information. At present, the two dominant platforms are:
- Active Directory
Other platforms supported by content management systems include:
- Windows NT authentication
- Novell Netware
The good news is that across all price-points, the majority of CMS solutions support integration with these platforms.
This is therefore one area where the availability of widely-deployed standards facilitates practical interoperability between information systems.
Data integration & interchange
There is considerable movement globally to develop specifications to communicate between systems, with most of these based on XML. While these are valuable initiatives, particularly in areas such as business-to-business (B2B) communication, it is important to recognise that these are just ‘enabling’ standards, not complete business solutions.
While the ability to communicate between systems is a pre-requisite for interoperability, it is also necessary to have common ‘dialects’ by which to share actual information.
The eXtensible Markup Language (XML) was developed by the World Wide Web Consortium (W3C) to provide a common language for developing systems and communicating between them. It has reached near-universal acceptance, and is used as the underlying basis for most of the standards described in the following sections.
It should be noted that while XML has been implemented in many content management systems, there is no agreed vocabulary (schema) between vendors. This effectively makes the storage of content proprietary, and limits the value of XML in achieving interoperability.
The Standards Australia publication An Introduction to XML for Knowledge Managers provides more information on how XML relates to areas such as content management.
Web services is a name given to a collection of specifications for communication between systems (as well as information storage and retrieval) using XML and web technologies. Development in this area is being conducted by the W3C, and by many proprietary software companies.
Specifications such as SOAP, WSDL and UDDI form the core of web services, although there are too many other specifications to list here.
Platforms such as J2EE and .NET have also gained wide adoption, and these are supported by many CMS products.
Increasingly, there is also a move towards ‘service oriented architecture’ (SOE) which aims to simplify the scalability and interconnection of these systems, although this is generally only seen at the higher-end of the market.
In all these cases, the base standards, such as web services, facilitate communications between systems (both within and between organisations).
As highlighted earlier, however, there still needs to be a common standard for the information itself, if meaningful interoperability is to be achieved. It is here that the difficulties arise, due to the lack of any consensus standards in this area.
The net effect is that using these data integration layers will almost always be a ‘custom-development’ exercise, specific to the particular product being integrated with.
This means that the interoperability must be developed on a case-by-case basis, and would need to be reworked when a specific CMS product is replaced (or upgraded).
De-facto standards such as RSS have gained wide adoption
Application integration (industry-specific)
There have been efforts within specific industries to develop interoperability standards, driven by the nature of the marketplaces in which they operate. The most visible activities relate to the syndication of news, and the sharing of e-learning information.
Beyond these, there are many other initiatives (such as in the medical field) which are too numerous to mention. Organisations should look to their peak industry bodies to determine whether there are useful standards in their domain.
RSS and NewsML
Rich Site Summary (RSS) has become the de facto standard for syndicating and republishing content. This has been integrated into the core of most lightweight content management systems (such as weblogs), and is increasingly being deployed into larger-scale solutions.
There are also complementary standards such as NewsML, which is typically used by the larger media organisations for the interchange of syndicated content, as well as for the broader management of news throughout its lifecycle.
There has been considerable activity around the development of e-learning standards for sharing and re-packaging content. This has produced specifications such as SCORM (Sharable Courseware Object Reference Model), which is designed to allow for the creation of reusable learning objects that can be used in different systems and organisations.
The IMS Global Learning Consortium has also developed a range of specifications to support interoperability between learning systems.
There are no implemented standards for the content itself
Application integration (content)
This is the highest level of interoperability between content management systems, whereby the content itself, along with site structure and metadata, can be shared.
Unfortunately, it is also the least mature of all the aspects of interoperability, with only metadata being served by open and implemented standards.
Shared content standards
There are no implemented standards that define how the content itself within a CMS can be communicated to other systems. While there are a number of document-centric standards, these have less relevance in the field of content management as they do not address issues such as site structure and inter-linking.
At present, beyond the Java standard outlined below, there are no initiatives working to develop such standards.
Content Repository for Java
The Java Community Process has initiated JSR 170, detailing a Content Repository for Java technology API.
This aims to provide a standardised mechanism for interacting with the information stored within the content repository of a CMS.
It targets a key interoperability need in the CMS marketplace, and has the potential to greatly improve communication between CMS products developed using Java by different vendors. It should also simplify the development of new systems for manipulating and republishing content stored within a content management system.
At the time of writing of this article, this project had proceeded through to ‘final draft’ stage, although adoption by CMS vendors has yet to occur to any real degree.
It will be some time before it will be possible to assess the full impact of this standard in the industry.
JSR 170 has potential, but has yet to be widely adopted
The goal of metadata consistency has been promoted by the Dublin Core Metadata Initiative (DCMI), which established a base set of metadata elements for all content. This has been implemented widely, and has been included as part of the core HTML standard.
In many industries and jurisdictions, this base set of metadata has been expanded to meet particular needs. For example, many government agencies have developed standard metadata schemas.
There are a number of active standards relating to the structuring and classification of content, including:
- Resource Description Framework (RDF)
- Topic maps (XTM)
- eXchangeable Faceted Metadata Language (XFML)
- Outline Markup Language (OML)
These provide a range of ways to structure information, and are valuable tools for interchange of information between systems. At present, however, these standards are not widely deployed in CMS products.
WebDAV stands for ‘Web-based Distributed Authoring and Versioning’, and is a set of extensions to HTTP to allow users to collaboratively edit and manage files on a remote server. It has been developed by IETF as a replacement for existing standards such as FTP. While it provides a useful set of capabilities, its implementation in commercial content management systems has been limited so far.
Its capabilities are also restricted by the original focus on automating the management of sites developed using legacy tools, and it is not clear that the design of WebDAV will have long-term relevance in the content management system market.
It is not clear how relevant WebDAV is to web CMS
As outlined previously, there are a number of very specific interoperability standards, but nothing currently implemented that covers the full capabilities of a content management system.
The impact of this is that all CMS products use proprietary technologies to structure and store their content. Even if XML is used, the lack of a common vocabulary between products means there is no simple way of communicating between them, or migrating from one system to another.
While there are a few fragmentary efforts to achieve CMS interoperability, it is expected that it will be several years before any concrete progress is made. The rapidly evolving nature of the market discourages collaboration between vendors, and this will only change once product capabilities become standardised.