Blogs

Basement Needs Cleaning; Hire a Maid (Using Content Analytics)

By James Watson posted 04-29-2010 13:17

  

There has been quite a marketing buzz from the supplier community on the promise of content analytics. For those organizations who keep everything forever (effectively filling their “basements” with stuff), these new “electronic maids” come equipped with brooms and vacuums that allow organizations to at least begin to address the disposition of electronic content.

“Content analytics” is the term used to refer to a suite of technical capabilities that are designed to automatically crawl file systems (network drives, SharePoint, email systems) and interrogate various attributes such as date authored, file type, and even key words within the content (such as “confidential,” for example). The output of the analytical assessment is a set of reports and dashboards an organization can use for a number of purposes, e.g. to develop a classification approach, plan for content migration, early case assessment, etc.

But does it work? The answer is yes, but keep your expectations in check. Sure, the tools can provide a list of all the files that are older than 5 years and haven’t been accessed for the last three. You can also sweep through a file server to determine if any documents contain the word “confidential” which perhaps should not be sitting on a system accessible by all employees but rather within a managed repository like IBM’s P8 or EMC’s Documentum. But the clients I’ve worked with, many of whom have tried various suppliers’ tools, have very real concerns about performance. So depending on the volume of content and the complexity of the analysis, be cautious about performance. More important, do performance testing with a specific set of requirements (what exactly are you trying to accomplish vs. just “fishing”).

What productivity improvements can you expect? The exact figure is hard to determine, because few organizations have ever invested the time to conduct credible benchmarks. The best indicators come from the e-discovery service providers, which have been using various capabilities for several years to cull through content placed on legal hold before conducting review. In these circumstances, the automated analytics tools provide improvements that range from 100 to 500 times better than manual efforts. That’s right – 100 to 500x. Particularly for simple tasks, such as determining author or date (basic interrogation of meta-data).

Currently, the tools usage has predominantly been limited to e-discovery – reactive exercises in response to a discovery request. Thus, the opportunity today is to begin leveraging these capabilities proactively and on a regular basis. Imagine, one by one, working through an organization’s network drives or SharePoint sites, and determining “what’s out there.” At a minimum, a sampling of 10 to 15 network drives might provide just enough insight to issue a wake-up call for senior management: “Let’s start cleaning this stuff up now, because it’s growing by 35% a year”; “Did you know that 18% of the files on our Z drive are exact duplicates”; “Fully 67% of the files on server RSO227 haven’t been touched for 3 years or longer”.

Make no mistake – human intervention is still required for decision-making, and that takes time, and more than likely a re-evaluation of policies and procedures. What do we do with the duplicate material: keep only the most recent? What about those files that are 3 years old: can we move them to tape for another year and then delete them?

Net, net, I’m bullish. In fact, I’m so excited about this segment of our industry, that many of our recommendations to clients include specifically designed plans directing them to begin using these tools. As expected, few organizations are systematic about cleaning up their “basements” (the same is true for many households, thus the need for maid services), and content analytics is clearly a capability worth investigating.



#ElectronicRecordsManagement #rockandroll #sex #Content Analytics #drugs
0 comments
6 views