Content migration: options and strategies
There is a lot of work involved in redeveloping and relaunching an intranet or website. The project management challenges start early, and it is easy to overlook the time (and effort) needed to migrate the content from the old to the new site.
Yet, for its lack of visibility, content migration is often the single biggest activity in a web redevelopment. Certainly it is the least interesting, and unfortunately unavoidable.
This article explores a number of options for the migrating content, and provides some practical suggestions that should help it to go smoothly.
Redeveloping a site
Two factors often drive the redevelopment of a website or intranet:
- Moving to a new technology platform, such as a new content management system (CMS) or portal package.
- Redesigning the site, either to address the weaknesses in the current site or to add significant functionality.
Often these two factors are bundled together, with a technology selection process combined with a redesign.
In either case, there is a need to migrate the content from the old site to the new site. This is not a simple process.
A change in technology platforms makes the migration challenging, as does a major restructure or redesign of the site.
Unfortunately, there is no option for avoiding the migration, and careful planning will be required to get the best outcome.
There is no way to avoid migrating content in a redesign
Three options for migration
Websites and intranets can easily consist of thousands, or tens of thousands of page (or upwards of millions of pages in some cases).
These pages have either been authored using FrontPage or Dreaweaver, or are locked up in a content management system (or portal).
If there was an industry standard way of exporting the content out of one system and importing it into the next, life would be very easy.
Unfortunately this does not exist. While there are some formative technology standards being finalised, these have yet to reach any significant level of adoption among products.
This makes migration a time-consuming exercise, and unfortunately one that cannot be easily avoided.
In practice, there are three main ways of migrating content:
- automated migration
- migration by hand
- partially automated migration
Of these, the first is clearly the most attractive option. Unfortunately there are many ifs-and-buts that rule this out in many common situations.
This often leaves migration by hand, with or without the support of some automated migration for certain types of content.
In practice, each option has its strengths and weaknesses, and these are explored in the following sections.
The article wraps up with some overall suggestions and recommendations for conducting the migration, making it as easy and effective as possible.
A fully automated migration is wonderful, if it will work
1. Automated migration
The option of an automated migration of content from the old site to the new site is clearly an attractive one. If it involves little or no manual effort, it would slash the time and resources needed by the web project.
Conceptually, there are a number of ways this could work:
- Export the content from the old system into some neutral (XML) format, and then import this into the new system. In the absence of industry standards, this would likely require custom development to connect together the export and import.
- Use the application programming interfaces (APIs) provided by the old and new system, and write a program to transfer the content.
- Trawl through the published HTML on the old site, and extract the content, throwing away the old formatting.
- Use a third-party migration tool, which provides tools and rules for handling the migration. (Behind the scenes, this would use one of the first three options listed.)
The first two ways involve working ‘under the hood’ with the content management system or portal products, and require the necessary functionality to be available in both the old and new products.
The third way allows the migration to occur without having to interact directly with the underlying publishing tools, but can be more fiddly and challenging as a result.
Regardless of the approach taken, there are a number of prerequisites before an automated migration becomes a viable option:
- The existing site needs to contain high-quality content, otherwise an automated migration is ‘garbage in, garbage out’.
- The new site needs to be structured similarly to the old site, otherwise the rules for placing content become too complex.
- The HTML of the old site needs to be clean and consistent enough, to allow the automated migration to occur.
Beyond these fundamental prerequisites, the automated migration needs to be technical feasible, which can only be determined on a case-by-case basis.
The lack of good support in content management systems and portals makes this a major stumbling block.
The net result is that a fully automated migration is not a practical option in most cases. The old site is simply too poor in quality, inconsistent or unstructured.
2. Migration by hand
Migration by hand is the simplest, but most painful option. In practice, this means cutting-and-pasting content off the old site, and into the new publishing tool.
This is very labour-intensive, with the work being done by the central team or decentralised content owners.
While laborious, hand migration is very flexible, and gives an unequalled opportunity to clean up the content (which is often one of the primary goals of a site redevelopment project).
Most organisations should expect to use a manual migration strategy, and should plan accordingly, both in terms of time and resources.
Suggestions are provided later in the article for managing this migration.
Manual migration allows the content to be cleaned up
3. Partially automated migration
While a fully-automated migration is out of reach in most cases, certain sections of the site may potentially be automatically migrated to reduce the amount of manual labour required.
Even when the old site is poor in quality and unstructured overall, certain areas of the site, or specific types of content, may have greater structure.
An example might be the 500 press releases on a site, published consistently following a very simple format. These don’t require reviewing, and they probably don’t require any cleaning up or reformatting.
An automated migration of these press releases would dramatically cut down the number of pages which need to be cut-and-pasted, saving time and effort
Any of the four automated migration options listed earlier could be used, although care needs to be taken that the effort needed to set up the migration doesn’t exceed the time it saves.
Still far from a ‘silver bullet’, a partially automated migration may be helpful for organisations confronting a huge migration activity.
Content must be improved as part of the migration
Suggestions and strategies
Regardless of the approach taken, there are a number of approaches and suggestions that can help to ensure successful content migration. These are outlined in the following sections.
Don’t migrate the current site unchanged
In the vast majority of cases, the problems with the current site are a trigger for the purchase of new publishing tools, or for the redesign of the site.
Chief amongst these are content problems: content is out of date, poorly structured, not written well, incomplete or inconsistent.
It makes no sense to go through all the effort of redeveloping the site, only to migrate the problematic content that initially triggered the project.
There is also a temptation to say: ‘we’ll migrate the current site into the new CMS, and it will be easier to clean up there’.
Unfortunately this rarely works in practice. Once the new site is live, the pressure to clean up the content evaporates, and content updates return to their usual slow pace.
The current site is often inconsistent in its design and structure, and replicating this unchanged in the new system takes time and effort.
In most cases, this effort is wasted, as a redesign will force another round of redevelopment of the publishing tools. This may mean that the implementation project is done twice, at twice the cost, just to save time on the initial migration.
Of course, it is not always the case that the current site is broken. If the site is well-managed and well-structured, there are no difficulties with migrating the current site, and this can dramatically simplify the overall process.
Clean up first
The site redevelopment project as a whole can last some time, from initial planning and budgeting, through to the final migration and go-live.
The opportunity to clean up the current site should be taken from the very outset of the project.
The more content that can be removed, simplified or rewritten early in the project, the easier the eventual content migration will be.
Look for ‘ROT’: redundant, outdated or trivial. This is the easiest content to identify and remove. In many cases, it may be possible to eliminate upwards of 50% of the site prior to content migration.
This clean up should be done by the decentralised content owners wherever possible, with the support and encouragement of the central team.
In practice, however, content owners are busy and the intranet is only one of their responsibilities. (This is generally why the content gets out of date in the first place.)
The central team should therefore be prepared to drive some of the cleanup themselves, targeting the more important areas of the site, or obvious ‘quick wins’.
Develop a detailed plan for the migration
Plan the migration
A detailed migration plan should be developed, specifying what content currently exists, what will be done with it, and where it will end up on the new site.
Start by conducting a comprehensive content audit. An excellent guide on doing this has been published by Adaptive Path:
In parallel to the migration, the structure of the new site (the ‘information architecture’) can be developed.
It is then a process of mapping pages on the old site to locations on the new site. This should be done within the content audit spreadsheet, and this forms a ‘to-do’ list for the migration.
An estimate should also be made of the resources required, and plans put in place to obtain these when the time comes.
Only migrate good content
One of the fundamental principles of any migration is: only migrate good content. Regardless of the approach taken (manual or automatic migration), this is the single best opportunity to deliver better information on the site.
Clear standards should be outlined for content on the new site, and these should be communicated to all content owners.
The policy must be set in place that all outdated, unnecessary or poorly written content must be removed or improved before being migrated into the new site.
The content migration is the only real opportunity to enforce this policy. Once the new site is live, the pressure will go off the cleanup process, and content updating will return to its normal slow pace.
Where appropriate, define multiple standards, recognising that not all content needs to be of equal quality. This allows the cleanup to focus on the highest value content.
Content without an owner will always be out of date
Ensure every page has an owner
Over time, organisational restructures and staff changes can leave quite a few pages without clear owners. On some sites, large sections can be dominated by ‘dead or dying’ content, long since forgotten by the people who originally wrote it.
A content owner must be found for every page as part of the migration process. Without this, the new pages are by definition out of date: without someone to review and update them, they can never be current.
An ironclad policy must be established stating that pages without a content owner will not be migrated.
In practice, business areas may be reluctant to ‘volunteer’ to own some content, leaving quite a few ‘orphaned’ pages.
One strategy is to list all the orphaned content publicly, stating that all the content will be deleted rather than migrated, unless an owner can be found.
This will generally flush out business areas that don’t want to see the content deleted. Voila! They are now the new content owners.
Beware of using university students to migrate content
Content migration is a very large, and very dull piece of work. On the face of it, it makes sense to ‘outsource’ the work wherever possible.
This could involve paying the technology vendor, hiring in temporary staff, or employing university students.
Outsourcing in this way has the obvious benefit of bringing greater resources to bear on the content migration, without burdening the centralised team or decentralised content owners.
The big issue, however, is that this can be ‘garbage in, garbage out’. Semi-skilled content migrators know little or nothing about the organisation, the site, or the subject matter.
They are not in a position to review the content, assess whether it is current or relevant, or to determine whether it’s needed.
This makes it a viable strategy when there is a large volume of content to be migrated ‘as-is’, but not when the content needs to be reworked and improved as part of the migration.
Provide content owners with support
It won’t be possible (or desirable) for the central team to do all the content migration, and the distributed content owners will need to be co-opted into the process.
This greatly increases the number of hands devoted to the migration process, and puts the responsibility back upon the owners of the content.
While the content owners are well placed to review the relevance of the content, they are typically not professional writers.
The central team should put in place clear support processes for the content owners, focusing on helping them with:
- overall planning
- content review guidelines
- mapping the content to the information architecture of the new site
- dealing with complex or unclear situations
While this will increase the workload of the central team, this kind of support is vital if the content of the new site is to be significantly improved.
Minimise the ‘freeze’
It can be difficult to manage the migration of content, while new content is being created and pages are still being updated.
For this reason, a ‘freeze’ is normally instituted, preventing anything other than urgent changes being made to the site.
This gives a little breathing room for the migration to be completed, without having to deal with ongoing updates.
While a freeze is valuable, it should be kept as short as possible. It obviously stands in the way of normal content updating, bringing the site to a standstill.
The longer the freeze, the harder it will also be to restart and re-engage the content authors and owners.
Set aside enough time
A major goal of most redevelopment projects is to deliver a site that is significantly better in terms of usability, functionality and usefulnesss.
The quality of the content is a key element of this, and this won’t be delivered if the content migration is rushed or poorly managed.
Depending on the size of the site, expect content migration to take 3-6 months, more if the site is particularly large.
This seems like a long time, and it is. One of the limiting factors is the time that the decentralised content owners can spend on the migration.
Without a full-time responsibility for their section of the site, they often struggle to make rapid progress on their migration efforts.
Content migration is an important part of any website or intranet redevelopment. While the time spent on migration is significant, so are the benefits.
If you are looking to deliver a site with dramatically better content, this is the opportunity to deliver it.
While content migration can sometimes be automated, a manual cut-and-paste migration effort is much more likely.
Plan for this, and allocate the time needed to make the migration effective. Setting clear policies that only ‘good’ content will be migrated, will result in a much better site.