Metadata fundamentals for intranets and websites

BEST PRACTICE ARTICLE

Brick wall construction from Shutterstock

Filed under: Articles, Intranets, Usability, Websites

Metadata is a topic that almost invariably comes up when creating or refreshing a website or intranet.

While basic metadata is routinely captured by most publishing tools, including content management systems and portals, there is still widespread confusion about its uses and limits.

Common questions include:

How important is metadata?
What metadata should we be capturing?
How is it created?
Where and when is it used?
Should we be implementing simple or complex metadata?

This article explores the fundamentals of metadata, as it relates to common intranet and website needs.

Standard metadata fields will be explored, and advice given on how to use metadata successfully, within typical organisational environments.

To start: what is metadata?

The most accurate description of metadata is that it’s ‘data about data’. This is pretty abstract however.

Perhaps a better description is that metadata is the details about the page, such as the name of the page, or who created it.

This information can be kept hidden behind the scenes, or listed on the published pages. Metadata may be very simple, or extremely powerful (and complex), depending on the situation and business needs.

Metadata is fundamental for sites but often misunderstood

The purpose of metadata

There are two fundamental purposes for metadata on intranets and websites:

helping end users find what they are looking for, via search or navigation
helping authors and administrators manage the site

Metadata such as title, keywords or description helps to provide better descriptions for pages on the site.

This can be used to improve the quality of search results in the site search engine, as well as helping to ‘promote’ more important pages to the top of the results lists.

Note, however, that most of the public search engines such as Google, Yahoo and Microsoft Live Search no longer make use of metadata on sites, due to the problem of spam or falsified data.

The metadata is therefore only used on the onsite search, or in site navigation.

Metadata can also be used to improve the manageability of the site. By capturing author, review data and expiry date, the problem of out-of-date content can be addressed (if not fully resolved).

Knowing who is the ‘owner’ of a page can also help to route feedback, and support other similar processes.

This kind of behind-the-scenes metadata, and the management benefits it offers, is one of the primary drivers for many content management system projects.

Beyond these two uses, there may be a mandated requirement for metadata, particularly within government agencies. While this metadata may be very extensive, it should still be assessed against these two fundamental purposes, and the principles outlined later in this article.

Standard metadata fields

Most content management systems, portals and other publishing tools support seven metadata fields:

title
keywords
description
publish date
review date
expiry date
author

These are provided out-of-the-box by almost all tools, and can be mostly taken for granted (although there are still some tools which don’t even support these fields).

Depending on the publishing tools being used, it may be simple, hard or impossible to go beyond these metadata fields.

Some tools will ‘bake in’ these standard metadata fields, with no possibility of going beyond them. This is not uncommon in content management systems focused on marketing-driven public websites.

Other tools can add metadata fields as additional customisation or development. A smaller proportion of tools offer simple ‘point-and-click’ interfaces for adding and maintaining these metadata fields.

Of course, like all things relating to content management systems, if you don’t need the flexibility, don’t ask for it. Simpler is better. (There’s more on the need for additional metadata fields later in this article.)

Seven metadata fields are supported as standard

Exploring these fields

Let’s look at the seven standard metadata fields, examining when and how they might be used:

Title

The name of the page on the site. This is a required field, and it is used as the title of the page in the browser, as well as for the heading at the top of the text.

Note that for search engine optimisation reasons, some publishing tools allow the author to have a more extensive browser title that is different to the shorter title on the page. This would only be relevant for public-facing websites.

Keywords

The subject or topic of the page, typically captured as a list of terms separated by commas.

This information can be used to improve the effectiveness of the site’s search engine, by ‘pushing up’ key pages in the list of results (by giving the keywords a higher ‘weighting’), or by ensuring that important pages appear even when the specific word being searched for doesn’t actually appear on the page.

Description

A brief description of the page, summarising the contents. This is used in the search results, to provide a better description than the automatically generated summary produced by the search engine itself.

The description is also displayed on public search engines such as Google, as well as potentially used in navigation links on the site published by the content management system.

A future publish date can be used to embargo content

Publish date

The date the page was first published to the site. If a content management system is being used it is often possible to set a future publish date. This ’embargoes’ the page, and automatically releases it at the specified date and time.

Review date

Specified when a page should be revisited and reviewed by the author or owner. Used in a content management system to help ensure that content doesn’t get out of date, with authors being sent a automated reminder email when the review date is reached.

Expiry date

When the page should be removed from the site and archived. Not relevant for most pages, the expiry date allows authors to automatically ‘unpublish’ items such as news and competitions when the end date is reached.

Author

The original creator of the page, automatically captured by the publishing tool. This is used behind the scenes to route review and expiry messages, and also may be published on the website to allow feedback to be easily sent to the page owner.

Fundamental principles

There are a number of fundamental principles which should drive the decisions about how much metadata to use, who should create it, and how it should be managed.

Metadata must be made easy for authors

Make it easy for authors

Metadata is not created magically, and there is no reliable technology for automatically filling out metadata fields based on the text of the page.

This means that people will be needed to create the metadata, typically the original authors of the pages.

Since most intranet and website content is created by business users, not web or content specialists, the entering of metadata must be made extremely easy.

Even then, it can often be quite hard to obtain consistently good metadata. (One of the widely known secrets in the content management industry is ‘everyone wants good metadata, but no-one has worked out how to get it’.)

Simply mandating or enforcing the capture of metadata will not be effective, as this doesn’t prevent the author from filling in only a few words for each field, or garbage text.

Wherever possible, the burden of entering metadata should be reduced. There are a range of strategies for this, including:

using the minimum number of metadata fields (see below)
using drop-down lists wherever possible, instead of free text
ensuring meaningful field names
providing supporting help text or descriptive information

Only capture what you need

There is a cost involved in each and every metadata value. Some have called this ‘bucks per tag’.

In part this is a technology cost, but the more important consideration is the amount of ongoing human effort needed to enter data in the metadata fields.

As discussed in the previous section, metadata is a burden on the authors of the content, and one that they may not fully understand or support.

For all these reasons, only metadata that has a concrete and immediate need should be captured. Don’t set up metadata fields to support potential future uses.

If there is the potential for more extensive metadata use in the future, choose a publishing tool that makes it easy to add extra metadata fields, without requiring customisation or development.

Don’t capture more metadata than is currently needed

Establish appropriate authoring models

The person who initially creates a page is likely to know the most about the subject matter covered on the page. In theory, this makes them the ideal person to enter the metadata.

In practice, however, authors may not have the skill, time or inclination to enter consistent and high-quality metadata. Remember that it is a professional skill to truly master metadata, shared by professional indexers and librarians.

Careful consideration should be given to who is going to enter the metadata, how much is entered, and who will review it.

A mix of approaches may be required, with responsibility shared between decentralised authors and a centralised team.

Establish governance models

Consistent metadata is always hard to achieve across an entire site, particularly with a decentralised authoring model.

An centralised review and ‘housekeeping’ process will be required, driven by the web or intranet team.

Efforts should also be focused on the most important content, rather than trying to capture complete metadata for the entire site.

Underpinning all of this should be an overall governance model for metadata. At a minimum, this should outline:

what metadata is being used
the purpose of metadata
roles and responsibilities
guidelines, tips and tricks

This should be written and communicated focusing more on support and training rather than rules and bureaucracy.

Use metadata to meet a business or site need

Have a clear purpose for metadata

Underpinning all of these discussions is a fundamental principle: have a clear purpose or business reason for the metadata.

The specific site design or business requirements will drive the amount of metadata needed. If there are powerful site elements requiring extensive metadata, capture and manage the needed information.

If the website or intranet needs are comparatively simple, start with the standard metadata fields, keeping the solution as simple and easy as possible.

Richer metadata

Only the most basic of metadata has been covered to this point, focusing on a few key fields to help users find their way through the site (title, keywords, description) plus further details to improve site management (dates and authors).

Metadata can be used much more extensively than this, targeting specific site or business needs. For example, additional metadata fields can be set up and used (if your publishing tool allows).

These might include:

geographic region
language
target audience
service offering
product

These fields can then be used to tailor how information is delivered on the site. For example, by marking all relevant pages to a specific product, the publishing tool can automatically generate a list of related pages.

This provides a seamless and effective way of relating marketing materials to support guides, for example.

There is no ‘one size fits all’ rule relating to this additional metadata, and the specific fields required will vary from site to site.

When considering the use of additional metadata, however, keep in mind the fundamental principles such as:

capture only what is needed
have a clear purpose and business need
make it easy for authors

While metadata can be an extremely powerful way of enhancing a site, the key challenge remains to get authors to enter it, consistently and accurately.

Taxonomies provide many rich metadata options

Role of taxonomies

As discussed earlier, the big challenge for metadata is that it needs to be correct and consistent before it’s useful. For keywords, this is particularly apparent.

When multiple authors are entering keywords, inconsistency will be rampant, due to basic differences (singular versus plural terms?) or differences in terminology.

These issues can be greatly reduced if users are picking from a standard list of items, rather than filling in free-text fields.

This may be as simple as a drop-down list of eight items, such as departments in the organisation. Such lists are simple to set up and use, and should be provided wherever possible.

Beyond this, a more extensive list of keywords (subjects) can be used. Called a ‘controlled-term thesaurus’ or ‘taxonomy’, these capture all the topics that may occur within the organisation.

Use of taxonomies can deliver more benefits than just consistent keywords. Once the subjects covered by pages are captured in a rich way, many different forms of searching and browsing interfaces become possible.

Of course, to make use of a taxonomy, you must have one available. Since a ‘generic’ taxonomy doesn’t make sense, each organisation potentially has to develop their own list of terms (perhaps drawing on existing taxonomies).

It takes several person-years of work to develop a taxonomy, making it hard to justify, even though the return on investment will be several times the initial cost.

In the shorter term, organisations should therefore look to simpler approaches to metadata, pending the development of a more extensive taxonomy.

Tagging and folksonomies

One comparatively new approach to metadata is called ‘tagging’ or ‘folksonomy’. This is where ‘tags’, the equivalent to keywords, are displayed on the published site.

End users can then add their own tags to pages, to help them find the page again, and in the process helping other users.

The tags used on the site are then often displayed as a ‘tag cloud’, which shows all the words used on the site, with more frequent words displayed in larger type.

While tagging has proven to be successful on sites such as these, its use on corporate websites and intranets is much less clear. The motivation and purpose for end users to tag our content is not obvious, and this is key to the tagging approach.

It is beyond the scope of this article to do more than reference tagging, and readers are encouraged to browse the web for a range of articles and books on this topic.

Much more could be said

There are whole professions devoted to the creation and use of metadata, and much could be said beyond the fundamentals covered in this article.

This includes:

use of metadata to drive navigation on the site, including faceted browsing
automated site features based on metadata
knowledge management aspects of metadata

It is impossible to cover all these topics in a single article, and the goal has been to focus on the core issues that must be understood by all intranet and web teams.

Further research into these more advanced topics is strongly encouraged.

Conclusion

Metadata is one of the key elements of site design and management, and is often a driving factor for the purchase of a content management system (or other similar publishing tools).

While metadata can be used in many complex and powerful ways, most sites benefit from keeping it pretty simple.

Only capture the metadata you need, and make it very simple for authors to enter it. Recognise that there is a cost (in effort and usability) for every metadata field established, and therefore focus efforts on clear business or site needs.