Auto-Classification for RM – beginning to see it as possible

By Ronald Layel posted 02-16-2012 15:07


With the usual disclaimer that views expressed here are my own, and not those of my employer or of the government agency with which we are engaged, I want to share my current thoughts on this.  As a skeptic, who for some time has been saying “I’ll see when I believe it”, I’m now ready to say that I’m beginning to see that it may be possible.

Yes, I meant to say I’ll see it when I believe it (not believe it when I see it), because it’s a case where I have had difficulty seeing the possibility and potential viability of an automated process that could effectively declare records from non-record and then classify records according to rules (retention schedules) that govern when disposition can and must take place.  From the perspective of a Records Manager working down in the weeds to actually do RM in a federal govt. environment (as opposed to those who make RM policies/procedures and invent ERM technology “solutions”), it was simply not believable that software algorithms and/or “predictive coding” technology could be relied on to accomplish this complex task with any acceptable degree of accuracy/success.

However, my “ah-hah” light bulb moment came this week when I listened in on the recent AIIM webinar, “Take the Guesswork Out of Information Governance with Auto-Classification” and heard presenter Mark Diamond of Contoural, Inc. say, “don’t let Perfect be the enemy of Good”.  He went on later in the presentation to show some statistics around classification accuracy indicating what they believe is achievable with manual classification done by subject matter experts and average employees versus auto-classification and predictive coding.  This data shows the expected accuracy of the two automated methods to be in the range of 80% – 90%.  Whether that can be substantiated or not, the more significant point for me was that this brought into focus the comparison with accuracy levels obtainable by our current manual methods.  In our case, when I ask myself how well we’re now doing on accurate identification of record versus non-record information and on proper classification of Records by applicable Record Series retention schedule groupings, the answer is “not very well”.  Diamond’s stats showed accuracy of manual classification done by average employees ranging from 20% to 80% (and he believes the 80% upper end is overstated.)  So, if success in auto-classification is defined as achieving an accuracy level that is better than current practice, there may be light at the end of this tunnel.

With the above said, I want to point out what was also made clear about the very significant challenges and obstacles that must be overcome before we could arrive at the point where Auto-classification can be implemented and relied on as a viable solution for RM in the federal government.  These are:

  1. “Big bucket” retention schedules are an absolute pre-requisite, and all Retention Schedules must be up-to-date.  This has been discussed for years within the government RM community, but so far no real movement in that direction. There is now some hope for this being done as part of the reform of records management in federal government as directed in the Presidential Memo of 11/28/11.  We understand that NARA has established a team to address major overhauls needed in the General Records Schedules (GRS), which includes developing guidance on incorporating GRS authorities into agency big bucket or functional schedules.
  2. Records content must be in an electronic repository for Auto-classification to work.  While this is theoretically possible and maybe even practical in smaller organizations, it is problematic and potentially very costly for large government enterprises that have Official Federal Records being created daily and residing for their entire life-cycles in a plethora of transactional business and mission supporting IT systems.  Will the Auto-classification technologies be offered and priced to be affordable for implementation across hundreds of disparate repositories? If not will this require attempts to build gigantic centralized repositories that collect and/or replicate from all systems that generate or receive structured and unstructured data that may potentially be classified as Official Record information?
  3. Key organizational stakeholders including RM, Legal, IT and Business Units must engage in a coordinated process to establish realistic expectations and to design the Auto-classification process (this is NOT an out-of-the-box “Easy Button” solution);
  4. Costs to implement are significant and must be budgeted –
  5. for the technology (the Auto-classification software application and possibly also the content management repositories as mentioned in item 2. above);
  6. for major staff level of effort by the “Core Team” (RM, IT and line of business SMEs) to do set up, create exemplar documents, train, test and refine the RM classification recognition models; then to perform classification on legacy documents; and finally to carry out on-going maintenance and updating of the model; and –
  7. for a Change Management Process that will move the organization from our current state of manual employee labor intensive RM practices to the desired state where RM classification can “run in the background” and be relied on to satisfy the organization’s needs for readily available information and to ensure compliance with RM regulations and agency policies.

So, all things considered, there is potential (and hope); but that glimmer seems to be pretty far down the tunnel for those of us in the federal government sector.

#autoclassification #ERM #ECM #RM #ElectronicRecordsManagement #government