Recognize duplicate folder structures with content analysis

by Evan Goris, on Mar 11, 2021 1:39:55 PM

Every time I do a file share analysis for a customer, I am surprised about the large number of duplicate files that are found on their file shares.

A simple comparison of the content inside the different files always results in an impressive list of files that appear more than once. Frequently, we find percentages of 30% to 50% which are duplicates. Such a comparison is not very informative, unfortunately, because it doesn’t reveal the reason behind the duplication. Sometimes duplicates are intentional, but sometimes they are not. In the latter case, potential cleanup operations on a file-by-file basis can result in confusing results and a very lengthy, manual process.

Identify duplicate content

Entire folders copied
Fortunately, in many cases there is more structure. Not only for the documents, but entire folders, including underlying files are often copied entirely to a different place on the network. This can be detected by comparing files based on content, based on file properties (such as name and owner) or a combination of these and other attributes. The grouping of those properties are inherited by the parent directories, whether or not the characteristics of those parent directories are considered. Regardless of the exact details of the parameters, we see that of the 30% to 50%, sometimes even closer to 80%, are caused by duplicate folder structures.

The explanation
There is a logical explanation for the existence of duplicate folder structures. A copy might be created automatically, in the case of a backup or a restore point. But in most cases the reason is the shortcoming of a file share to share files and folders between users. When users do not have access to a particular folder, they will save the necessary folder somewhere they do have access. The result: duplicate folders and duplicate files.

It’s not only because of storage that duplicate folder structures are undesirable. When migrating from file shares to SharePoint, for example, it is probably not intended to bring duplicate files and folders to the new collaboration system. A cleanup operation that removes duplicate files is essential.


Analyse duplicate content on network drives?

Want to know which folders on your network are duplicates? Xillio can perform an extensive analysis of all content on your repository or network driven. Download the complete overview of the insights our file share analysis gives you.

Content Analysis: here is what you can learn 


Topics:SharePointDeduplicationEnterprise Content ManagementContent AnalysisFileShare


Xillio blog

On this blog, you will find more information about Xillio, our products, and market developments.

Subscribe to updates