Migrating all of your content to Sitecore can be a great challenge. Especially if your current system is based on simple pages and you want to shift to a component-based approach for optimizing content consistency and re-use.
Sitecore offers the ability to organize your content logically and re-use your content items on different pages through page components that fetch their content from content items. This way you can build your pages from several reusable content blocks.
From pages to components
If your current web content management system is not organized this way, but rather based on plain pages, how can you transform your pages to reusable components and place them on the right pages?
We have been faced with this problem before. The process of an automated content migration to Sitecore can be split up in four phases:
- Content inventory
- Content extraction
- Content transformations
- Content import
1. Content inventory
During the content inventory we analyze all content in the source system. We fetch the content structure, both for pages and binaries (images, documents, and other files). The results of this inventory can be presented in a spreadsheet that displays the site structure in a clear tree view. This spreadsheet can be used for the structure mapping.
Additionally we analyze the content model of the source system. Which content types, schemas, templates, etc. have been defined, which fields do they contain? This yields an overview of the content model which can be used for the content mapping.
2. Content extraction
After the content has been inventoried, the content can be extracted from the source. If the source system has an API available for content extraction, that’s the best way to go. We prefer to extract the content in XML format. Together with all page content, all binary content is downloaded as well.
3. Content transformation
Before we can migrate the extracted content to Sitecore, we have to transform it so that it conforms to the content model of the Sitecore. These transformations can be subdivided into two parts:
The source system’s content model is different from Sitecore’s (unless you are upgrading). In this step the content model of the source system (obtained during the content inventory) is compared to Sitecore’s. Based on the differences between the models, a set of mapping rules is defined that map the source content types and their fields to the Sitecore content types.
If you’re moving from a page-based approach to a component-based approach in Sitecore, this is the moment where your pages should be transformed into components. This transformation can be defined in a set of specific rules that can map complete pages, a set of fields, or single fields to page components. Both field-to-field and field-to-component mapping rules can be presented in one structured spreadsheet.
Apart from differences in the content model, there are differences between the structure of the content items in the source system and Sitecore. To overcome this difference, the source structure must be mapped to the desired target structure. This can be done by rearranging the tree view of the source site structure in the spreadsheet that was created during the content inventory. This yields a spreadsheet that reflects the site structure in Sitecore.
Spreadsheet to XML transformations
Both the field-mapping and structure-mapping spreadsheets can be processed by dedicated scripts that translate the mapping rules to XML transformations. This way the content XML that was extracted from the source can be transformed to a new XML format that conforms to Sitecore’s content model and site structure.
4. Content import
After the content has been transformed to an XML format that conforms to the content model and site structure of Sitecore, this XML is loaded into Sitecore using its API. The web services this API offers out of the box might not be sufficient to establish all of your site’s specifications. We have developed an additional web service that handles it all.
Controllable and consistent
In our experience this approach results in a controllable and consistent web content migration. The important mapping is done in a predictable and repeatable manner. Compared to manual content migration, the risk of errors is considerably lower, if not zero.
Learn more about upgrading Sitecore in this case description of a Sitecore upgrade at Beth Israel Deaconess Medical Center (BIDMC - the teaching hospital of Harvard medical school).