
Brick wall construction from Shutterstock
Filed under: Articles, Intranets, Usability, Websites
Metadata is a topic that almost invariably comes up when creating or refreshing a website or intranet.
While basic metadata is routinely captured by most publishing tools, including content management systems and portals, there is still widespread confusion about its uses and limits.
Common questions include:
- How important is metadata?
- What metadata should we be capturing?
- How is it created?
- Where and when is it used?
- Should we be implementing simple or complex metadata?
This article explores the fundamentals of metadata, as it relates to common intranet and website needs.
Standard metadata fields will be explored, and advice given on how to use metadata successfully, within typical organisational environments.
To start: what is metadata?
The most accurate description of metadata is that it’s ‘data about data’. This is pretty abstract however.
Perhaps a better description is that metadata is the details about the page, such as the name of the page, or who created it.
This information can be kept hidden behind the scenes, or listed on the published pages. Metadata may be very simple, or extremely powerful (and complex), depending on the situation and business needs.
Metadata is fundamental for sites but often misunderstood
The purpose of metadata
There are two fundamental purposes for metadata on intranets and websites:
- helping end users find what they are looking for, via search or navigation
- helping authors and administrators manage the site
Metadata such as title, keywords or description helps to provide better descriptions for pages on the site.
This can be used to improve the quality of search results in the site search engine, as well as helping to ‘promote’ more important pages to the top of the results lists.
Note, however, that most of the public search engines such as Google, Yahoo and Microsoft Live Search no longer make use of metadata on sites, due to the problem of spam or falsified data.
The metadata is therefore only used on the onsite search, or in site navigation.
Metadata can also be used to improve the manageability of the site. By capturing author, review data and expiry date, the problem of out-of-date content can be addressed (if not fully resolved).
Knowing who is the ‘owner’ of a page can also help to route feedback, and support other similar processes.
This kind of behind-the-scenes metadata, and the management benefits it offers, is one of the primary drivers for many content management system projects.
Beyond these two uses, there may be a mandated requirement for metadata, particularly within government agencies. While this metadata may be very extensive, it should still be assessed against these two fundamental purposes, and the principles outlined later in this article.
Standard metadata fields
Most content management systems, portals and other publishing tools support seven metadata fields:
- title
- keywords
- description
- publish date
- review date
- expiry date
- author
These are provided out-of-the-box by almost all tools, and can be mostly taken for granted (although there are still some tools which don’t even support these fields).
Depending on the publishing tools being used, it may be simple, hard or impossible to go beyond these metadata fields.
Some tools will ‘bake in’ these standard metadata fields, with no possibility of going beyond them. This is not uncommon in content management systems focused on marketing-driven public websites.
Other tools can add metadata fields as additional customisation or development. A smaller proportion of tools offer simple ‘point-and-click’ interfaces for adding and maintaining these metadata fields.
Of course, like all things relating to content management systems, if you don’t need the flexibility, don’t ask for it. Simpler is better. (There’s more on the need for additional metadata fields later in this article.)
Seven metadata fields are supported as standard
Exploring these fields
Let’s look at the seven standard metadata fields, examining when and how they might be used:
Title
The name of the page on the site. This is a required field, and it is used as the title of the page in the browser, as well as for the heading at the top of the text.
Note that for search engine optimisation reasons, some publishing tools allow the author to have a more extensive browser title that is different to the shorter title on the page. This would only be relevant for public-facing websites.
Keywords
The subject or topic of the page, typically captured as a list of terms separated by commas.
This information can be used to improve the effectiveness of the site’s search engine, by ‘pushing up’ key pages in the list of results (by giving the keywords a higher ‘weighting’), or by ensuring that important pages appear even when the specific word being searched for doesn’t actually appear on the page.
Description
A brief description of the page, summarising the contents. This is used in the search results, to provide a better description than the automatically generated summary produced by the search engine itself.
The description is also displayed on public search engines such as Google, as well as potentially used in navigation links on the site published by the content management system.
A future publish date can be used to embargo content
Publish date
The date the page was first published to the site. If a content management system is being used it is often possible to set a future publish date. This ’embargoes’ the page, and automatically releases it at the specified date and time.
Review date
Specified when a page should be revisited and reviewed by the author or owner. Used in a content management system to help ensure that content doesn’t get out of date, with authors being sent a automated reminder email when the review date is reached.
Expiry date
When the page should be removed from the site and archived. Not relevant for most pages, the expiry date allows authors to automatically ‘unpublish’ items such as news and competitions when the end date is reached.
Author
The original creator of the page, automatically captured by the publishing tool. This is used behind the scenes to route review and expiry messages, and also may be published on the website to allow feedback to be easily sent to the page owner.
Fundamental principles
There are a number of fundamental principles which should drive the decisions about how much metadata to use, who should create it, and how it should be managed.
Metadata must be made easy for authors
Make it easy for authors
Metadata is not created magically, and there is no reliable technology for automatically filling out metadata fields based on the text of the page.
This means that people will be needed to create the metadata, typically the original authors of the pages.
Since most intranet and website content is created by business users, not web or content specialists, the entering of metadata must be made extremely easy.
Even then, it can often be quite hard to obtain consistently good metadata. (One of the widely known secrets in the content management industry is ‘everyone wants good metadata, but no-one has worked out how to get it’.)
Simply mandating or enforcing the capture of metadata will not be effective, as this doesn’t prevent the author from filling in only a few words for each field, or garbage text.
Wherever possible, the burden of entering metadata should be reduced. There are a range of strategies for this, including:
- using the minimum number of metadata fields (see below)
- using drop-down lists wherever possible, instead of free text
- ensuring meaningful field names
- providing supporting help text or descriptive information
Only capture what you need
There is a cost involved in each and every metadata value. Some have called this ‘bucks per tag’.
In part this is a technology cost, but the more important consideration is the amount of ongoing human effort needed to enter data in the metadata fields.
As discussed in the previous section, metadata is a burden on the authors of the content, and one that they may not fully understand or support.
For all these reasons, only metadata that has a concrete and immediate need should be captured. Don’t set up metadata fields to support potential future uses.
If there is the potential for more extensive metadata use in the future, choose a publishing tool that makes it easy to add extra metadata fields, without requiring customisation or development.
Don’t capture more metadata than is currently needed
Establish appropriate authoring models
The person who initially creates a page is likely to know the most about the subject matter covered on the page. In theory, this makes them the ideal person to enter the metadata.
In practice, however, authors may not have the skill, time or inclination to enter consistent and high-quality metadata. Remember that it is a professional skill to truly master metadata, shared by professional indexers and librarians.
Careful consideration should be given to who is going to enter the metadata, how much is entered, and who will review it.
A mix of approaches may be required, with responsibility shared between decentralised authors and a centralised team.
Establish governance models
Consistent metadata is always hard to achieve across an entire site, particularly with a decentralised authoring model.
An centralised review and ‘housekeeping’ process will be required, driven by the web or intranet team.
Efforts should also be focused on the most important content, rather than trying to capture complete metadata for the entire site.
Underpinning all of this should be an overall governance model for metadata. At a minimum, this should outline:
- what metadata is being used
- the purpose of metadata
- roles and responsibilities
- guidelines, tips and tricks
This should be written and communicated focusing more on support and training rather than rules and bureaucracy.
Use metadata to meet a business or site need
Have a clear purpose for metadata
Underpinning all of these discussions is a fundamental principle: have a clear purpose or business reason for the metadata.
The specific site design or business requirements will drive the amount of metadata needed. If there are powerful site elements requiring extensive metadata, capture and manage the needed information.
If the website or intranet needs are comparatively simple, start with the standard metadata fields, keeping the solution as simple and easy as possible.
Richer metadata
Only the most basic of metadata has been covered to this point, focusing on a few key fields to help users find their way through the site (title, keywords, description) plus further details to improve site management (dates and authors).
Metadata can be used much more extensively than this, targeting specific site or business needs.
For example, additional metadata fields can be set up and used (if your publishing tool allows).
These might include:
- geographic region
- language
- target audience
- service offering
- product
These fields can then be used to tailor how information is delivered on the site. For example, by marking all relevant pages to a specific product, the publishing tool can automatically generate a list of related pages.
This provides a seamless and effective way of relating marketing materials to support guides, for example.
There is no ‘one size fits all’ rule relating to this additional metadata, and the specific fields required will vary from site to site.
When considering the use of additional metadata, however, keep in mind the fundamental principles such as:
- capture only what is needed
- have a clear purpose and business need
- make it easy for authors
While metadata can be an extremely powerful way of enhancing a site, the key challenge remains to get authors to enter it, consistently and accurately.
Taxonomies provide many rich metadata options
Role of taxonomies
As discussed earlier, the big challenge for metadata is that it needs to be correct and consistent before it’s useful. For keywords, this is particularly apparent.
When multiple authors are entering keywords, inconsistency will be rampant, due to basic differences (singular versus plural terms?) or differences in terminology.
These issues can be greatly reduced if users are picking from a standard list of items, rather than filling in free-text fields.
This may be as simple as a drop-down list of eight items, such as departments in the organisation. Such lists are simple to set up and use, and should be provided wherever possible.
Beyond this, a more extensive list of keywords (subjects) can be used. Called a ‘controlled-term thesaurus’ or ‘taxonomy’, these capture all the topics that may occur within the organisation.
Use of taxonomies can deliver more benefits than just consistent keywords. Once the subjects covered by pages are captured in a rich way, many different forms of searching and browsing interfaces become possible.
Of course, to make use of a taxonomy, you must have one available. Since a ‘generic’ taxonomy doesn’t make sense, each organisation potentially has to develop their own list of terms (perhaps drawing on existing taxonomies).
It takes several person-years of work to develop a taxonomy, making it hard to justify, even though the return on investment will be several times the initial cost.
In the shorter term, organisations should therefore look to simpler approaches to metadata, pending the development of a more extensive taxonomy.
Tagging and folksonomies
One comparatively new approach to metadata is called ‘tagging’ or ‘folksonomy’. This is where ‘tags’, the equivalent to keywords, are displayed on the published site.
End users can then add their own tags to pages, to help them find the page again, and in the process helping other users.
The tags used on the site are then often displayed as a ‘tag cloud’, which shows all the words used on the site, with more frequent words displayed in larger type.
Some of the most celebrated uses of tagging can be found on Flickr (www.flickr.com) and Delicious (www.delicious.com).
While tagging has proven to be successful on sites such as these, its use on corporate websites and intranets is much less clear. The motivation and purpose for end users to tag our content is not obvious, and this is key to the tagging approach.
It is beyond the scope of this article to do more than reference tagging, and readers are encouraged to browse the web for a range of articles and books on this topic.
Much more could be said
There are whole professions devoted to the creation and use of metadata, and much could be said beyond the fundamentals covered in this article.
This includes:
- use of metadata to drive navigation on the site, including faceted browsing
- automated site features based on metadata
- knowledge management aspects of metadata
It is impossible to cover all these topics in a single article, and the goal has been to focus on the core issues that must be understood by all intranet and web teams.
Further research into these more advanced topics is strongly encouraged.
Conclusion
Metadata is one of the key elements of site design and management, and is often a driving factor for the purchase of a content management system (or other similar publishing tools).
While metadata can be used in many complex and powerful ways, most sites benefit from keeping it pretty simple.
Only capture the metadata you need, and make it very simple for authors to enter it. Recognise that there is a cost (in effort and usability) for every metadata field established, and therefore focus efforts on clear business or site needs.



Nothing about Dublin Core? RDF? Atom? I understand this is a “primer”, but it would have been neat to work a bit of that in. Also, have you checked out the Calais system that Reuters is putting together? Advanced automatic m/data generation, basically.
Yes, should’ve made reference to Dublin Core, although at the basic level it boils down to just title, keywords and description (with perhaps a “DC.” at the beginning).
RDF is certainly a powerful framework, but definitely falls into the “advanced metadata” category. I’m not sure how Atom/RSS fits in the topic of metadata.
My aim in writing the article was to get web and intranet teams up to speed on the key topics, on the assumption that a lot of the behind-the-scenes details would be handled by the publishing tool (CMS, etc). As I indicated in the article, there’s lots of value to be gained in researching more advanced topics…
In terms of automated classification, that’s a big topic! Like most things, where a significant investment can be put into the tools, good value is gained. But these are certainly not “install out of the box, and voila metadata!” solutions.
Cheers, James
James,
You mention two key aims of metadata:
1. helping end users find what they are looking for, via search or navigation
2. helping authors and administrators manage the site
While it may seem a subset of your first point, I believe “helping applications/tools/bots find data” is important enough to stand on its own.
This is becoming more important as online tools increase in number and complexity especially when used for mashing data (God i hate that word) from difference sources.
@Russ, a late reply on your comment about providing metadata to help applications/bots to find data. Definitely an important aspect, but I’d highlight that concrete needs must be identified in advance.
For example, in Australian government, a lot of metadata was collected against future plans to automatically create “portals” on specific topics. But these never eventuated, in part due to the patchy quality of the metadata itself. So that left gov agencies with the mandated requirement to collect masses of data, but for no clear purpose. (This has subsequently been made optional in most cases.)
So automatic use of metadata is incredibly powerful, but only when done well (or at all!).
Yes, good point Russ! Although your typical website or intranet team is not yet publishing much content that would fall into this category…
Nice article and I think you have got the tone & level spot on! Just a quick comment on controlled vocabularies and taxonomies…
I agree with your point about ‘generic’ taxonomies tending to defeat the purpose hence the discussion of the merits of developing your own. However, you didn’t mention that are a vast number of existing specialist control vocabularies already in the public domain. For example sectors including; health, cultural heritage, and government all make extensive used of domain specific standards based controlled vocabularies. Adopting one of these may give an organisation the benefits of a highly refined controlled keyword list without the pain of developing and maintaining their own. For those people working large organisation have a chat to your librarian about what may already exist in your industry. Otherwise have a trawl the web!
Hi Andrew, agree on the value of pre-developed industry standards, these can often do 80% or more of the heavy lifting…
One word of caution: there is a big difference between a taxonomy designed for classification, versus one designed for navigation.
The most extreme example: the library of congress thesaurus is extraordinary at classifying pretty much anything, but totally hopeless for navigation.
So double-check that the industry taxonomies will fit how you want to present information on the website. If so, you’re in business!
Thanks, James
Nice job. Stumbled on your insightful writeup from Wikipedia Infodesign link.
Follow-on discussion is thoughtful too.
I can apply much directly to my current business problem of cleaning up a messy operations doc repository.
Just what the doctor ordered!
Great article, James. Another major reason we place a heavy emphasis on metadata on our intranet and websites is reuse (i.e. entered into metadata fields can be reused in multiple places on the site).
Hi James,
a very good introduction into the subject! A few years ago we developed a taxonomy for the keyword field of our intranet. As a starting point we used existing industry classifications, supplemented with terms from standard textbooks and finaly incorporated our company specific vocabulary. This mixed approach produced a quite sound taxonomy. Later on it also provided the core content to our company glossary on the intranet.
Hi Kate, metadata can be used in a variety of powerful ways, including driving content reuse and automated related links.
Of course, a high level of discipline is required to make this successful. If the metadata isn’t consistently high-quality, then automated uses of it can break down, or generate some strange results.
Well done on making this work, no wonder you have an award-winning intranet! :-)
Hi Martin, I love your step-by-step approach to developing a taxonomy, many organisations could learn from this! The use as a glossary is a nice end-user feature to deliver from the behind-the-scenes taxonomy…
After working with some biggies, I’m *very* wary of taxonomies. The theory of using established taxonomies and everyone being able to share data in insightful ways is great, it just doesn’t work. To begin with, there’s the question of which taxonomy. I was once involved with a defence project that was investigating which of say 10 main taxonomies used in Australia and by our main allies could be usefully implemented, each offering several dozen top-level categories and up to tens of thousands of individual terms. The plain fact is that no-one does or is going to use these taxonomies comprehensively or correctly. Secondly, it’s impossible to know what users want from your data. You might think you’re publishing country profiles or travel guides, but what people are searching for is economic data or industry insights. Finally, there’s the sheer size and consequent management issues of these taxonomies. The reasons you can go to hospital fills a 1000-page book in 9-point type and keeps an office within the health department quite busy. AGIMO’s new master metadata plan fills an A3 page in maybe 8-point type without even beginning to address how it will relate to other industry-specific taxonomies relevant to specific departments. All of these initiatives are fantastic in theory but will never be used by more than a handful of people. Metadata can’t be generic enough.
Hi Brian, completely agree on the challenges inherent to taxonomies! These are hugely hard things to put into place, mostly due to the underlying organisational complexities.
Of course, if an effective taxonomy can be deployed, the benefits gained are ten times the initial cost. But I agree, there are few organisations who have mature enough information management practices to allow this to happen.
One good book to read on taxonomies has been written by Patrick Lambe in Singapore:
http://www.organisingknowledge.com/
Hmmm, maybe I need to get out more, but I haven’t seen a taxonomy that delivers much benefit. I’m happy with basic metadata as described above for basic site management but doubt we can even justify the effort put into AGLS metdata, given that it is only used by one search engine that generates maybe 1% of traffic. If metadata is only for internal use, it probably doesn’t need to be so complex, and we can simply accept that it only serves one purpose. At the CBR WSG meeting yesterday Stephen Zafira described tagging as ‘dynamic IA’, which I think is a step in the right direction towards accepting that data will be interpreted in different ways by different users, and frees us from trying to write ‘one taxonomy to rule them all’. KISS, live and let live.
Great to know the basics of metadata. The article is very simple to understand.
Could you tell me how to insert metadata in the intranet. Is XML the best way to integrate metadata