Blogs

The Value of Full-Text Search for Records Management

By Mark Mandel posted 07-07-2010 16:29

  

When setting up a full Enterprise Content Management System that includes Records Management, a major challenge is defining the metadata that is to be included for each document or record.  There is an inherent tradeoff between not enough, and too much, information that is to be included in the metadata.

If there is not enough metadata, it will be difficult to find documents and records.  If there is too much, then it will be expensive, time consuming, and difficult to ensure accuracy during the indexing process.

It is often very difficult to anticipate how documents and records will be used in the future, and what information will be needed for a search. For criminal investigations and E-Discovery the search criteria may not be directly associated with the original content or use of the document or record.

Full text search tools provide the solution to this dilemma. These tools allow a user to search a large collection of documents or records based on the content, in addition to the metadata. Email archives have this feature, allowing you to search on the metadata (From, To, Date, Subject) as well as the content of emails.

For scanned documents, enabling full-text search requires full-text OCR.  The OCR engine converts the bitmapped image to character data, and creates a searchable index that is mapped to the coordinates of the image.  The best products provide "hit highlighting" so that a search result is highlighted on the screen.  They also include advanced features such as proximity (a word within "n" words of another word), fuzzy (to allow for misspellings or partial words), phrase, wildcard, or concept searches.

Concept search engines take the process to another level, allowing searches for concepts rather than exact words or phrases.  This gives your search process amazing power to relate ideas and concepts in ways that were never possible before.

These features are very important for email, collections of Word documents, instant messages, and the like, especially for research, FOIA, and E-Discovery.

If your ECM application has full-text OCR and full-text search as options, consider purchasing them.  Many high-end ECM applications do not have them however (an amazing fact, since many low to mid-range applications have had them for decades!) so you may have to purchase another product and integrate it into your enterprise solution. 

When implementing this feature with Records Management, beware. You need to consider how the full text index and access controls will be treated when records are locked, placed on litigation hold, redacted, sealed or expunged.  You may get an unpleasant surprise if you find that you can still search for records that are supposed to be sealed or redacted.

There are many products out there that provide excellent options for adding this capability to your enterprise solution.  Hopefully it will require little or no custom integration, but depending on your ECM product set you may find that necessary. These features are not relevant to all record collections, however. For example, full-text search provides little benefit for a collection of invoices or spreadsheets.  It works best for large collections of unstructured documents or messages.



#ElectronicRecordsManagement
0 comments
33 views

Permalink