Tackling drafts and duplicate documents in SharePoint

by Evan Goris, on Apr 23, 2014 1:30:00 PM

In my current assignment on behalf of Xillio, I work at an enterprise IT department in a team of business analysts and software architects. Recently we welcomed a new member to our team. One of the first things he suggested is we upload all information we have collected so far onto the project's SharePoint site. I bit my tongue.

Major problems

I can understand where our new colleague's request comes from. Adding all documents to SharePoint saves him a lot of time collecting stuff from emails, network drives, and the project site itself still unfamiliar to him. I haven't adviced against it because I don't want to be in his way, but actually I should have. Because of issues that have nothing to do with him personally. This SharePoint project site, like many others, suffers from two major problems.

Document status

The first is I cannot assess a document's status. Is it a work in progress, a formal draft for review, a finalized and approved one? Particularly in this project, where insights change rapidly, developers need correct requirements, and stakeholder management is a huge task, the question is very relevant. We cannot afford our developers to build something unwanted or misinform stakeholders who require careful treatment for political reasons.

Matter of discipline

All kinds of works-in-progress floating around SharePoint is inherent when it is used as a collaboration tool. Automation does not offer an immediate solution. No matter how sophisticated the technology, only a human can determine if a document is finalized and has been agreed with relevant parties. So how do we enable users of the project site to better assess the formal status of a piece of information? A pragmatic solution could be to introduce folders with only finalized versions that can be shared with stakeholders and development teams. It is a matter of discipline to uphold this rule.

Tackling drafts and duplicate documents in SharePoint

Document versions

The second problem I find is multiple versions of the same document. This particular SharePoint implementation has a versioning system in place. If a user updates an existing document with the exact same title, the document is overwritten. Users therefore always find the most recent version. But some project members take a manual approach to versioning by adding dates to document titles. If they upload this onto SharePoint, the versioning logic isn't triggered because the system sees them as different documents due to the titling difference. This is an understandable limitation of SharePoint's versioning capabilities.

Smarter deduplication

In contrast to document status, document versioning is a good candidate for automation. Something more advanced is needed however than just title comparison. Xillio offers powerful document deduplication technology that can compare the contents of two documents, even if the titles are not identical. If two versions are found that are 95% equal, an example business rule is: the one updated most recently is kept, the other one removed and/or archived. This way only the most recent version of a document is ever made available. With this technology, users are free to title documents whichever way they want without cluttering SharePoint with several versions of the same.


Migration as a Service example



Xillio blog

On this blog, you will find more information about Xillio, our products, and market developments.

Subscribe to updates