Manual versus automatic content classification

In January 2016 a Dutch TV program reported that 5,600 medical records of Dutch patients were prepared for digitization (free of staples and paper clips) by Belgian prisoners. Although the prisoners had signed a confidentiality agreement according to the party concerned, scanning and preparation of papers by this group raised some questions.

For organizations that are digitizing and archiving documents and files, the scanning street is just the beginning of a complex process. Not only is manual preparation at the beginning of the scanning street still commonplace, improvements are also possible in the rest of the process. After the first step (the actual scanning of the documents) classification and metadata is added to documents, often manually.

Inconsistent classification
Whether it is done at the end of the scan street, or in case of content enrichment such as a migration of file shares to a new ECM environment (e.g. SharePoint), manual document classification is time consuming, and it is not consistent. Although librarians and information specialists are highly skilled, it is difficult for a team to classify content consistently and unambiguously, even if they are following a standard template. Give a set of documents to different specialists and there is discrepancy in the way they will classify the documents.

Content classification nowadays is not always a task for specialists, it is often performed by people within the organization at the moment they introduce a document. Often they don’t see the importance of proper classification, and problems arise with the quality of the classification (no training, so inconsistent allocation of metadata). classification-of-documents

This problem is of course not a new problem, but it is something that is becoming more acute. First, because the volume of documents and information significantly increases, and secondly because there are increasing risks and costs when it is clear that your document management does not comply with laws and regulations.

Automated classification
A proven method to classify documents is to use intelligent tooling. Multiple vendors offer solutions for automatic classification. Most of the solutions, however, classify based on the form or layout of documents, and found keywords. Xillio's approach goes a step further and assigns labels based on grammar, spelling, choice of words and repetition used in the document.

In addition, Xillio's solution works on any set of documents from any given content system, mostly the network drives or file shares. But also think of ECM, DM and DAM systems, even custom-built legacy systems, or systems which are only partly in use.

And the winner is…..
Manual classification is subjective and therefore inconsistent, but not necessarily worse or less accurate than automatic classification. Nowadays there are better ways to classify your documents than doing it by hand, with solutions that offer the same or even higher accuracy and guarantee a huge improvement in terms of production speed.

------------------------------------------------------------------------

Analyse your content

Want to know which content in your file share or network drive has correct metadata and which lack good metadata? Xillio performs extensive analysis of network drives. 

Content Analysis: here is what you can learn

Furthermore, we recently did a project to automatically add metadata to OpenText Content Server content. Read this case study.

Example of Automated classification and metadata enrichment

 

Share this post on

Comments

Subscribe to email updates