In an earlier post I explained why metadata matters. This time I want to zoom out a little and discuss how to assess the quality of your document collection. Essential to turning data into information.
Distinguishing data, content, and information
When clients come to us with the ambition to improve their ECM system, we first focus on the quality of their document collection. We start by explaining the difference between data, content, and information. A common way to define these three terms is as follows:
- Data: documents without meaningful metadata or text.
For example a collection of images or PDF’s without a text layer.
- Content: data with text.
For example office documents or PDF’s with a text layer.
- Information: content items with meaningful metadata.
Information as desired state
In this example every category depicts a quality improvement over the previous one.
- Obviously, the least preferred category of your documents is Data.
- The Content category allows for textual analysis.
- The most preferred category is Information. Documents in the Information group:
- Empower your ECM system to offer efficient search, personalization, etc.
- Allow for filtering and clustering based on metadata.
Defining meaningful metadata
I intentionally used the word ‘meaningful’ when describing the metadata in the Information category. Meaningful metadata provide more insights into the document than just the author and modification date; they tell us something about the context of the document. For example:
- What project it’s associated with
- In what stage of the project it was created
- Its topic
- Its retention state.
Document quality as decisive factor
When thinking about improving search results or implementing new personalization features, it is critical to know the current quality level of your document collection. Even the most brilliant features cannot hide the truth when it comes to the quality of the underlying documents. That’s why every information plan should start with a qualitative audit.
Key ingredients of an information plan
Your information plan should hold at least the following elements:
- Inventory of how your current document collection is distributed over the Data, Content, and Information categories.
- The criteria of your Information category in terms of structure, content, and metadata, based on the features you want to implement.
- How to get your documents into the Information category through metadata enrichment.