Text Analytics and Records Management

By Bryant Duhon posted 12-21-2012 08:56


Al Linden previews his AIIM 2013 conference session. Enjoy, and follow #AIIM13 and @AIIMcon for all of the latest news and noise about next year’s event.

Alan Linden
Senior Technical Consultant
EID Inc.

Records management and text analytics - a powerful marriage for ECM

Thursday, March 21 10:00

Text analytics may be the best kept secret in our industry. Although the technology is not new, one version started in Bell Laboratories as an R&D project called LSI (Latent Semantic Indexing) many years ago and there is a large body of information on the Internet regarding this statistical approach. The algorithms are in the public domain and a number of companies have used it “under the covers” in their own software products.

AIIM 2013 in New Orleans on March 20-22Other text analytic products use different algorithms and some are natural language oriented instead of mathematically oriented. Text analytics under various names or products is used extensively in litigation support.

So what is the relationship between text analytics and records management? I if we start to think about it, records management is typically about structured fields/materials: accounting records, HR records, college student records, medical records – the list goes on.

The beauty of text analytics is that it deals in unstructured, typically textual materials. THE MARRIAGE OF STRUCTURED RECORDS WITH UNSTRUCTURED TEXT IS A VERY POWERFUL COMBINATION!

Let’s look at a real world example. Resumes are typically a combination of structured and unstructured (textual) materials. Your name address, phone number, cell phone number, and email address are typical structured fields. With modern OCR technology we can pick up these structured fields in the scanning process if submitted in paper form. The OCR engine can pick up and translate the remaining characters and send them into the system. Or if the resume is submitted electronically the same process occurs without the necessity of OCR. So now how do we handle the text? Typically it would go into a textual search data base of which there are many on the market. So how does regular text search differ from a text analytic engine? The typical text data base uses Boolean commands or string search or wild card search techniques to find the test in question.

So what does text analytics do? Well a large number of things. It’s best known for automatic categorization of text but it can do a variety of other tasks: automatic summarization of documents, content search, and clustering of documents within meaningful categories (i.e., “bomb” could be an explosive or a show on Broadway that wasn’t successful). The software can make that distinction. That is a far cry from a normal text search engine. Another example is SPAM. Is that SPAM in can or SPAM on a computer? Text analytics makes those distinctions.

If I’ve piqued your interest come attend my Roundtable on Text Analytics and Records Management at the AIIM Conference.

Be sure to register today before AIIM 2013 sells out!

#textanalytics #autoclassification #AIIM2013 #Records-Management #AIIM13 #ElectronicRecordsManagement