Know What You're Capturing: Document Classification

By Anthony Macciola posted 09-10-2010 13:56


How do you know what kinds of documents you’re capturing? How do you figure out what types of documents are entering your organization? These are the kinds of questions that need to be asked in today’s document-driven organizations in order to maximize productivity, increase efficiency and reduce costs.

Most people today are still using the “old school” method of using document separator sheets with patch codes and/or barcodes on them. Although very effective, this often comes at a cost; materials (paper and ink) and labor relative to manually reviewing each physical piece of paper and determining the appropriate place to insert the preprinted separator sheet.

For those of you still using this method, it’s time to embrace the 21st century and think about how you can best drive cost out of your capture operation by eliminating the need for separator sheets.

The compelling replacement for separator sheets is document classification. There are two distinct technologies used for classifying documents.

  • Image-based classification:Looks at the geometry of the image, based on image layout and patterns; can be trained by showing the classifier samples and labeling them appropriately.
  • Text-based classification:Looks at the content of the document and is based on text patterns; can be trained by showing the classifier samples and labeling them appropriately.

Classification is also a critical component of document separation. Document separation technology leverages classification techniques to determine what it’s looking at and then applies additional logic to ascertain document boundaries and logical groupings. Together, classification and document separation are a cost-effective alternative to document separator sheets.

In addition, document classification can have a significant impact on other aspects of your overall capture operation.

  • Mailroom Operations: classification can be used as a tool to discern what is being captured and where it should be distributed.
  • Capture for archival purposes: classification can be used to determine the appropriate folder for the captured content.
  • Capture for the purpose of driving a business process: classification can be used to ascertain what has been captured and determine the appropriate workflow to initiate.

Effective classification offerings require little to no prior setup. If you’re using a solution that requires scripting or dictionary (keyword) creation and/or maintenance, you should re-think what you’re doing. Competitive classification offerings require nothing more than showing the system a variety of samples (tens of samples vs. hundreds or thousands) and labeling them. From there, the system can and should do the rest.

If you’re managing a capture operation you should be focused on how you can minimize the amount of human interaction/intervention required to maintain operations, be it scan preparation, scanning, quality control, indexing or otherwise. Classification is one of many tools that you can utilize to increase the overall efficiency of your day-to-day operations.

