3 dataset rules you must use for compliant content migration
by Maurice Bakker, on Jul 5, 2018 12:47:06 PM
The abbreviation ETL (Extract, Transform and Load) that is used to explain the phases to execute migrations, does not only represent the physical act of extracting, transforming and loading data from and to a system. They also represent specific datasets or databases that are used to manage, control, monitor and verify the migration. To make a content migration successful, compliant and auditable, 3 main rules that have to be taken into account.
- 1. Never (ever) modify the dataset that is acquired during the Extraction phase.
After extracting data from the source system in the extracting phase, never actually touch or modify the data that is extracted. During extraction all kinds of information about the content, such as creation dates, modification dates, other metadata, such as author(s), file size, datatype metadata and the physical location of the content is extracted and stored in the extraction dataset.
The initial extraction creates a dataset that is basically a clear view of reality. The Extraction dataset will be viewed as “reality” in the migration from this point onward and serves as a point of departure for data modification and transformation. In order to be able to test and report on the effects of the modifications and transformations, this extraction dataset must remain intact, always. Hence, rule 1.
- 2. Use a second database to store any data modifications and transformations.
Exclusions, data combinations, metadata enrichment and all other actions that modify anything that is initially stored in the extraction phase should be stored in a separate database using a unified data model, this is called the transformation dataset. This dataset serves as a preparation for the loading phase. In the transformation phase, the data is prepared to fit in the data model of the target system, migration rules are applied and exclusions are defined.
This is done in a separate database in order to be able to test, verify and report on the effect of these changes in comparison to the established reality in the extraction phase.
It is essential to have a detailed report about the objects that are subjected to exclusion and change to execute a test migration. The test migration provides content owners the ability to verify that objects are actually excluded and other migration rules are applied correctly. Changes that derive from this can be easily processed by adding or modifying migration rules and rebuild the transformation dataset again and execute another test migration. This is possible because the extraction database is still "reality".
We now have two datasets that are comparable to each other and the effects of migration rules become clear. When all is said and done, the dataset can be loaded into the target system and we are ready to explain rule 3.
- 3. Verify the migrated objects in the target system.
When importing an object into a target system it will some way or another give a response about the import action. It will return a document ID for example upon successful importing the object. During this phase, the actual object from the source system is actually only picked up (based on the stored source system location of the object in the extraction phase). After a successful response from the target system (e.g. getting the object ID, indicating that the import has worked) the response information is stored with the actual object in the dataset. This completes the full circle.
Although documents are not physically modified in most cases during the move action (from source to target system) it is possible to create object hashes from the source system to be able to verify the data integrity of any object in the target system.
When all objects are stored, it is possible to compare all data from the transformation dataset to see if all objects have successful response information from the target system. If not, these items are checked and checked again until they are successfully migrated.
All these steps can be traced back by reports that are provided in every step of the process. The total successfully migrated objects in target system must match the exact number of objects that were identified to be migrated in the transformation dataset in order to complete the content migration.