Cleansing
More and more CMSes are based on XML and intranet and internet sites must conform to the web standards developed by the W3C. To better the quality and accessibility of websites in 2004 the Dutch government formulated guidelines for all websites for government use.
A clean start
Cleaning is in fact a two part process. Cleaning the contents of the material, for instance converting HTML to XHTML so it complies with the new CMS, and secondly cleaning the content set according to the mapping rules. This can for instance mean that news articles before 2002 will not be migrated or that the pages with less than 10 page views per year are skipped. Both parts must be configured in a migration framework.
Cleaning the content
Content is cleaned according to the followingprocedure.
- Translation
Converting HTML tags, styles and attributes to the correct naming convention and structure. - Deletion
Removal of depreciated, erroneous and unused tags, scripts and unwanted code. - Transform
Based on transformation rules, XML will be converted to the XML format that is prescribed by the CMS.- RestructuringThe web standards and guide lines contain restrictions concerning the structure of a page.Think for instance of paragraphs that must always be inside specific HTML tags.
