Inventarisation
After launching an intra- or internet site the amount of content often grows exponentially, regular content types or variations on existing content types are added. In addition to this as content types change the old types are often left in the system, so the CMS generated overviews of the content often give a distorted view of the real situation. Moreover restrictions in the current CMS or the creativity of the editors often results in misuse of the content types, unnecessary new content types and orphaned content on the site. In this step of the migration path the content is collected and documented.
Lack of overview
Depending on the level within the organisation where the CMS is maintained, chances are that there is no complete overview of the existing content and content structure. Just going to the file system and counting files is no use. Multiple versions of the same file may exist or files which are no longer used on the site may still exist in the file system. Moreover often it is not clear how a simple overview can be generated of what is in the CMS and what is published on the site. In some systems pages are dynamically built up from multiple reusable components, which complicates getting a full view of the situation.
Inventory of both site and CMS
For a full inventory it is advisable to start with both the front-end site and the CMS. The site contains important information that cannot directly be derived from the CMS, for instance in many cases the navigation is in built up by the site according to rules and dependencies, or information about the context is available that cannot be derived directly from the CMS. Besides that the CMS can contain unpublished content or metadata that is not available on the site.
Inventory of content types
To be able to migrate content firstly the pages and content types (such as news articles, parliamentary papers, or file pages) must be inventoried. All links within the site are followed and the content is recorded, including relevant information such as:
- Metadata
All metadata that is associated with the content item in the current CMS, like for instance the date, publication state or language.
- Fields and structure
What are the fields that make up a page or content type? What are the associated formats, value lists, restrictions and dependencies are in place?
- Relations
Which content refers to this item and to which content does this item refer?
- Location
Where is the content item found on the site? Is it part of the navigation and does it occur on more locations?
Inventory of documents
Aside from pages, an inventory must be made of documents. This includes files such as Microsoft Word, XLS, PDF, images and video files. For each document at least the following must be recorded:
- Metadata
- Relations
- Location
Valuable information
During the inventory, information about the content must be structurally recorded, so it will be possible to generate several reports from it. In addition to reports listing the number document per content type, pages to which no reference exists or broken links, it must also be possible to generate specific reports. The inventoried content can for instance be connected to a statistics system. A page may be published but when the site statistics show that the page has only been viewed three times during the past year action can be taken. It is wasteful to start with superfluous content when migrating to the CMS and using the reports generated priority can be assigned to the content. Also using the inventory it becomes possible to compare the fields of the different content types in a matrix – how different are the content types really?
Inventory as starting point for the migration and the CMS
Not only is the inventory a necessary starting point for the next steps on the migration path but it is also the perfect starting point for formulating the content types and templates of the new CMS. The inventory results contain critical information when deciding the new content structure. The inventory results are often used to set up rules for cleaning up the content but it is also a very good time to scrutinize the structure of the content types. For instance when comparing two content types if 19 of the 20 fields match then it is useful to reconsider the content structure. Do not start the migration path and the inventory too late – when this decision critical information is available late in the project, it has either negative consequences for the project quality or the planning.
