Metadata...Now and Zen

By Monica Crocker posted 04-15-2013 17:17


I have spent thousands of hours with users  figuring out what metadata should be put on documents.  In my experience, once they learn that they can “tag” unstructured content with searchable, filterable values, they go a little crazy.  They want every possible value they might ever need to be able to find, or report on, their documents.  It can quickly spiral out of control.  As my S.O. often reminds me, “Gear down, Turbo.”

On the other hand, I need to require certain metadata that may not initially be perceived to have value to the business.  You know, so that the documents can actually be managed (especially disposed).  As a result, I find myself in the awkward position of trying to convince a business area that they don’t need 15 of the 20 fields they think they need, but they do need two they don’t have.  In general, I find there is often a perception that records management compliance dictates a “bunch” of metadata on every item.  Not true, my friends.  I want very little metadata for my purposes.

After years of fruitlessly pursuing perfection, my bare bones metadata requirements boil down to this: I need to know what it is, how old it is and who it belongs to.  That’s it.  Now, from a business perspective, don’t you think those fields would be helpful to users, too?  That’s my goal….give me the information I need, but let’s work together to do it in a way that makes sense within the context of the business process.  The result should be that it also provides information the business unit needs.

I translate these requirements into a maximum of three metadata fields:  Document Date, Document Type and Document Owner.

Let’s talk about Document Date (or Content Date or whatever you want to call it)

The purpose of the Document Date field is to provide a single field that can be referenced across a given system (or later, a repository of data from multiple systems) to determine the age of a piece of content.

Indication of age is important to both users and those governing the information contained within enterprise systems.  In general terms, Document Date is usually the “date created” for an internally generated document and the “date received” for an externally generated document.   Document Date is usually a custom field that needs to be added to overcome the limitations inherent in most system generated date fields.  The system field that indicates when the document was added might not be a good indicator of the age of the document if it was actually created previously and just recently uploaded to the repository (such as during a backfile conversion effort).   Modified date may not be a good indicator of age for the same reason, plus, there are occasions when a unit may make a minor correction to a document, but not want the Document Date to change, even though the modified date changed.  I have also had the rare use case where users need to make a document (such as a proposal) available to be worked on in a system TODAY, but they want the document date to reflect a future date (the day the proposal is due).   

The key advantage of the Document Date field is that it allows the user to provide a date that is meaningful within the context of the business application.  For some business applications, this date will not change over the life of the document.  For other business applications, the Document Date may change when a new version of the document is created after the document is checked out and updated.

So, whenever I’m working on the information architecture for a content management system, the first thing I do is check to see if there is a user populated date field.  If not, first step is to add one so that every account that uses that system can leverage it to date their documents.  I can’t tell you how many system migrations I’ve worked on where I ask “how can I determine which documents in the system don’t need to be migrated because they’ve passed their retention” and the answer is “we don’t know.”  Can I use the Index Date?  Well, no because some of our documents were added as part of a backfile conversion.  Can I use Create Date?.  No, for the same reason.  Plus, initially we added documents at the end of processing and then we switched to adding them before they were processed, so the actual document might be months older than the create date indicates, but only for some of the documents….anybody remember when that change happened?  Can I use Modified Date?  No, because the system updated the modified date if we had to make a mass update to customer ID number when two of our accounts merged, which we had to do at least twice since the system went into place.  ARGH!!!! 

Document Date is the “the system can’t know everything about this document” field. 

If you defined Document Date as a required field, you can create ONE additional custom field to hold it that covers every account in your repository and just keep re-using it.  Amazingly, if a certain business application considers the Proposal Date the document date, and the next business application considers the Invoice Date the document date, I have found most users are can understand that “Document Date = Proposal Date” for their documents.  OR some systems are sophisticated enough that you can “mask” the Document Date field with a value specific to each account that makes sense within the business context. So, in the backend database, the field is Document Date, but for the accounting people, it appears as Invoice Date and for the sales people it appears as Proposal Date.  It’s a miracle!  The great thing about this approach is that, you could perform a search or filter search results by that same date field across ALL your accounts.  Slick.  Also, I find it perfectly acceptable to auto-populate the Document Date with today’s date as users are adding documents, but make sure the “add a new document” interface allows the document submitter the option of over-riding that value for the small percentage of documents that need a different date.

Wow.  I just used a lot of words to explain that I don’t need a lot of words.  Discussion of Document Owner and Document Type in my next blog. 

#ElectronicRecordsManagement #documentdate #InformationGovernance #metadata #enterprisefiling #keepitsimple