Three significant trends we witnessed in the year 2010 that is changing the Document Capture landscape forever

By Kevin Neal posted 01-04-2011 18:22


The ‘No Folder Zone’


Despite tremendous improvements in document capture technology and ease of use becoming more prevalent, the fact of the matter is that document capture is not totally automated and often involves human intervention. Therefore, careful considering the pro’s and con’s of your document capture strategy is imperative to ultimately create better operational efficiencies within your organization or, unfortunately, cause unnecessary burden within your business process.


Technologies such as Intelligent Document Recognition (IDR) or Automatic Forms Processing to automatically identify documents and extract information from scanned images are fairly amazing and perform highly automated functions if the system is designed with well-known document types. In other words, the information on the pages such as a invoice number is in a fairly consistent part of the page (i.e. always in the upper-right hand corner of the page). But when more and more document types are introduced to the capture system, the complexity of the system becomes exponentially more difficult and chances are that the automation accuracy will decrease.


The truth is that these capabilities are not complete magic (yet) and require system administrators to carefully develop capture strategies that assist the capture software in making intelligent decisions about documents. If you are in the document capture or document scanning business you’ll often hear the phrase similar to, “Oh, I’ll just use my existing multifunction device to scan to a folder and let my capture software process the scanned images from the folder.” While this approach of document capture is certainly an option that works, this road to document capture is littered with potential potholes, possible dead-ends and a lot of downstream work that should be carefully considered.


The idea of scanning images into a folder and then performing data extraction from these images is certainly not new. In fact it is probably the most commonly used method to get images into document management systems, however there are certain considerations to take into account when using this capture technique. Just because it’s simple to configure, cost effective and works, this does not mean that it is necessarily the most effective. For some of the reasons I will elaborate below the year of 2010 saw a dramatic rise in The ‘No Folder Zone’.


A truly integrated document capture strategy has some of these qualities that scanning to folders may lack:

  • Reduce complexity of the capture system through centralized control
  • Enforce business continuity from the repository, not desktop
  • Eliminate the need for rescanning and ensure optimal image quality


While there are several methods to get an image into a document management system (including scanning to a folder), what is just as, if not more, important is getting the properly associated metadata or index values with that image into your repository for search and retrieval purposes. Otherwise your document management system is nothing more than a glorified publicly shared folder on the network where retrieval of these images is done by memory or found by file name only. Scanning to a folder is not necessarily a bad thing based on your organizations particular requirements, however when many people are contributing scanned documents into a system this creates honest mistakes such as lack of consistency, decreased efficiency and potential security or retention risks.


The “Twilight Zone” is defined as “the ambiguous region between 2 categories, states, or conditions (usually containing some features of both)”. This is a also a good description of The ‘No Folder Zone’. While scanning to a folder, then importing might give the appearance of an integrated solution, the truth is the region of connectivity (integration) is ambiguous between capture and ECM repository. A solid document capture system will contain the following certain qualities:


  • Changes in the Enterprise Content Management (ECM) system should immediately be reflected in your document capture solution
  • Mapping of capture software index fields to ECM index fields is dynamic
  • Affords the system to be modified, changed or enhanced easily as organizational requirements change


My main point in writing this blog post about the ‘No Folder Zone’ is not to bash all that is wrong or point out potential pitfalls with scanning to folders. In fact this is a great solution if this is truly what a particular organization requires. However, far too often taking the simple approach of scanning to folders is the easy way to offer document scanning to users and many of the other issues this causes are not carefully considered. As system administrators become more aware and truly understand some of the incredible advances in document capture technology then hopefully they can appreciate that a well-designed document capture system can drastically help reduce labor costs, improve quicker access to information and be a strategic business advantage, as well as improve adherence to compliance or regulatory standards.




As always I appreciate the time you’ve spent to read this posting about The ‘No Folder Zone’ and how this trend is influencing the Document Capture business. I welcome comments, feedback and/or constructive criticism. Please feel free to click ‘The SharePoint effect’ graphic below to read about the second trend witnessed in 2010 that changed the Document Capture landscape forever.





#webservices #integration #folders #idr #ScanningandCapture #adr #Recognition
1 comment


10-23-2012 12:44

Thanks Chris. My fear is that we are creating vast amounts of electronic junkyards even as I type this message; and the problem is becoming exponentially worse every single second we don't start offering solutions for this issue. Just as I typed that sentence there was probably hundreds of millions of gigabytes of content uploaded and stored somewhere without any context. By ‘without any context, I mean these are just images, videos or audio that a computer has no understanding. They are big blobs of stored ‘stuff’. In order to even begin to organize data, much less exploit this information, you must provide some level of way for computer systems to understand what the content is. Call it whatever you want ‘meta-data’, ‘indexes’ or ‘tags’ but somehow these computer clues related to a particular piece of content is the most rudimentary step in providing systems the ability to organize and analyze content. It’s not magic, rather is carefully concealed advanced technology that still makes the user experience elegant without too much intrusion, yet provides the functionality that is expected these days. For example, user expectations are quite demanding these days, as they should be, and it is just expected that we can do searches within content systems such as Facebook, LinkedIn or our ECM systems and find relevant content we are looking for.
As it relates specifically to Information Management, we all know mobile is a major technology trend for ‘consumption’ and will trend to ‘contribution’ as well. This is the real danger because there is no way I can envision expecting people to apply too much meta-data, if any, manually on awkward touch-screen interfaces. People will just simply snap a photo and upload so there has to be some method to extract and apply meta-data to this content, especially if its business related. Personal is probably just as important but business content, must have this index information.