Can We Automate Retention?

By Susan Cisco posted 02-03-2011 11:00


In her recent post @, Julie Cogan noted that some innovation in developing and implementing retention schedules is happening in the content analytics space. Using modeling tools to extract vocabularies for taxonomies for Enterprise Content and Records Management (ECRM) systems is an attractive approach to speeding up the process of manually developing taxonomies. My question is how do you map record retention periods to a modeled taxonomy?  I asked 3 subject matter experts and decided to include their full responses because they’re thoughtful and informative:

David Sanchez, VP – U.S. Federal Public Sector, Education, & Healthcare, Concept Searching(

With Concept Classifier for SharePoint 2010, records professionals can align metadata to records retention codes and instantly synchronize these “records declaration and tagging” business rules to the SharePoint 2010 Term Store.  As documents are ingested into SharePoint, an event handler is triggered, Concept Classifier for SharePoint interrogates the documents for any terms, multi-word fragments, and concepts related to records retention codes, and then automatically applies the appropriate records retention code as a metadata tag stored in SharePoint properties. 

In terms of automated retention being applied, it is the retention code that is automatically being applied based on rule sets where controlled vocabulary is aligned to retention codes by records managers.  The records management system would then use the retention code as a trigger for disposition. 

For instance any document with a retention code of "x" would have a predetermined storage location and destruction date.  On example would be in the department of defense where any document dealing with nuclear medicine or radiation has a 75 year retention period and all of those records must be shipped in original form to the USAF School of Aerospace Medicine where those documents are stored in a climate controlled warehouse.  

Rich Hale, Chief Technology Officer, Active Navigation (

The mapping of retention or disposition schedules to any taxonomy is the job of information management professionals and whilst software tools and formal (or less formal) methods are key elements in making that job productive and scalable, I see no substitute for that sort of people-focused and centrally managed approach. As with everything in life, the key is balance; in this case between simplicity vs. complexity, accuracy/completeness vs good enough and fixed vs flexible. Another key issue is that the taxonomies in use and their supporting rules must reflect what is actually happening on the ground rather than some abstract view of what might be happening. All too often I have seen the very information management professionals I refer to above designing intellectually complete models that remain just that, models, rather than becoming practical tools. So, from my perspective the start of the process is providing those professionals with an effective means to understand and represent the information and concepts that actually exist in the corpus they are working with. That’s why my focus is in providing tools that can help the information management professional understand the information he or she is working with and, importantly, involve business subject matter experts (SMEs) in that process. It’s amazing how often disengaged SMEs asked to help in the process of modelling information structures become far more animated and enthusiastic when presented with a consumable insight into the corpus in question; once this happens, taxonomy modelling can become an exchange or conversation with the business and that always produces better results than imposed or abstract solutions. Ideally, those solutions should help keep the conversation alive so that taxonomies and their supporting rules evolve as information and the business changes. Supporting that process needs to be part of any software that provides a continued view of the value and health of corporate information and I expect to see that as an important feature of all enterprise governance programs in the future.

Seth Earley, CEO, Earley & Associates, Inc. (

Taxonomy development involves analysis of content itself as well as user tasks and processes.  In this analysis one can include the full body of content as the perspective of search results.  Another source of candidate terms are the search logs.  Search logs tell us the language that people are using to execute their queries and can be linked to preferred terms through synonym relationships. 

Taxonomies are frequently used as access structures (though they do not have to be the same as navigation). The challenge is that records management retention schedules do not necessarily map to something that is usable to an end user.  A retention schedule is based on a document types that have legal significance from a retention perspective.  Document management systems attempt to organize information according to what has significance to the user or the business.  These may or may not be aligned with retention structures.  In the lingo of records management, a navigational access structure is called a file plan.  A file plan might have the following categories: 

Legal Activities

Regulatory Activities

Regulatory Board Activities

 Content that falls into Regulatory Board Activities will be retained for a certain period of time based on the industry and its statutory requirements.  Would users access this structure on a day to day basis?  If not, then we would need to map this file plan to categories that users find more intuitive and expose those to the user.  Or the mapping is done by taking the more user intuitive categories and determining what constitutes Regulatory Board Activities for which a retention schedule exists.  A document might be Meeting Minutes.  Meeting Minutes would require additional metadata to facilitate accessibility such as Date, Topic, Regulatory Area, Related Legislation, Impact, etc. 

So there is no simple one to one mapping but instead an understanding of user access scenarios and metadata that are mapped to retention periods.  The core principle is the document type – the “is-ness” which corresponds to the file plan with desired retention periods.  Additional controlled vocabularies are developed in order to make the content more usable and accessible.  These vocabularies are derived using the approaches outlined.    

My Conclusions

  1. Organizations need to apply retention periods to records and other information across the enterprise because the alternative is to keep everything indefinitely.
  2. With automatic assignment of retention to electronic records, initially someone has to map retention codes to “controlled vocabulary” in modeled enterprise taxonomies.
  3. Record retention schedules are usually developed in a vacuum. Could we use modeled enterprise taxonomies to inform the process so taxonomies and retention schedules are aligned? As taxonomies change over time, retention schedules could evolve accordingly and vice versa.

 What do you think?


#ElectronicRecordsManagement #retentionschedules #Taxonomies #electronic records management