Why assessing the access date adds value?

by Werner Taube, on Jul 1, 2020 3:00:00 PM

Although you might think you are familiar with the access date on your data, a lot of people mistake it for the modified date. The access and modified date and time are closely related, however not the same. By default, most file explorers don’t show the access date and time or do not have the feature enabled. Some file systems don’t support the access date.

So, what is the access date then? After doing my research on various file systems, I can convincingly state what its intention is. It is supposed to display the date and time when a file or container’s (folder in most cases, document management systems have different terminology) contents have been accessed.
Think briefly on what that means: when “opening a file with a program” it is “accessed”. When looking at the information of the file like filename or size, the contents are not looked at and therefore not accessed. The confusion with the modified date is understandable: opening a file is often to view, adjust and save.

Use access date for migrating content

But why is it interesting then? On your home system, it probably isn’t. On shared volumes where you collaborate, it is. It can tell you whether the location or the document is still being visited and thus relevant.
Two examples:

  • a folder created 10 years ago, modified 8 years ago and accessed for the last time 8 years ago, is not likely to be visited the coming time and should be archived.
  • a document on guidelines created 10 years ago, modified 5 years later, and accessed last week, indicates it is still being looked at and should be considered active.

Looking at one location or one document might not be so interesting, but the collective information of your overall file system can tell you a lot. That is one of the things that we do at Xillio.

I wrote: “…its intention is…” deliberately, the reason for that is because there are variable dependencies on when it updates or is being applied. Document management systems will have their implementation of the access date within their system. The underlying create/modified/access date and time on the actual file storage are not visible to the user and can be ignored.

However, honestly: how many of you still have a shared drive with your colleagues? A department- or ‘public’-drive as to easily share and collaborate? On a share, it is implemented differently, if at all.

Add to this the following variable dependencies when the access date is applied:

  • the operating system that gives out the file share
  • the file system formatting
  • the operating system the file system is connected to
  • with what/how you access the system
  • in cases: a flag in the registry
  • queued time

The last one might surprise you. It seems that the “access date update” is queued and has a lower priority than some of the other tasks that the operating system executes. This ‘delay’ varies greatly, during my testing it sometimes took almost 8 hours before the access date was updated, making my initial testing observations invalid. This can be triggered manually even.

I know you wonder what the added value is. Analysing content is a complex mechanism when providing information to establish a threshold, the expectation is that it is exact and calculated. Value can be found in the conciseness of how access behaves and what exactly it means in your situation whenever you try to decide based on it. You now know that there is more to it.

Accidentally influencing the access date, unfortunately, happens with a lot of tools. Think about search spiders, but also when running statistics on your documents, accidental access happens. I’ve investigated when this is changed on various filesystem formats. The most common scenario: NTFS-formatted file system accessed via a NetBIOS drive-letter mapping to a Windows client, but also on ext4 and HFS/AFPS.

One finding was a dependency on the program being used. Viewing a file in Notepad, for example, did not affect the access date. Opening the document in MSOffice did.

I’ve run several (Perl & Python) scripts, they both ‘touched’ the file, making the access date unreliable.

To be able to discover all metadata of a file without touching it, not only professional analysis and migration tools are needed, also people that understand how it works. It needs thorough investigation, but I can imagine that you prefer not to deal with this tedious work. Then call us, Xillio. We’re the migration expert delivering insights into your content.


Topics:Enterprise Content ManagementContent Analysis