The prospect of 100% accurate OCR results is a good driver when it comes to implementing the technology, most of the time. As an advocate, and promoter of ways to increase OCR accuracy I too sometimes get caught in the quest for perfect results, but is the effort always worth it? It can be very difficult to put metrics on OCR improvement, but there is usually a point when the cost of the effort to improve OCR results is more than the benefit gained by the improvement, and it depends solely on your environment.
Usually environments are just fine with the out-of-the box accuracy of 98% on traditional 300 DPI documents, but some organizations need more. The biggest contributing factor is not so much desire, but more about volume. I’ve worked on environments where each iteration of improvement would take a week’s time, and yield a .2% reduction in uncertain characters. A week’s amount of effort could cost as much as $10,000. However, a .2% reduction in uncertain characters in an environment that processes 6 million pages a day means approximately 24,000,000 less viewed characters per day, which is a real saving, and usually produced a ROI in three months or less. However, an environment processing 2,000 pages a day will hardly notice it.
This calculation seems fairly straight forward, but when in the grips of the accuracy quest it’s easy to ignore how much it really costs to make accuracy improvements. It seems in my experience that once you hit the accuracy walls of 85, 87, 95 or 98% (common percentage accuracy based on document quality), the cost of improvement needs to be highly scrutinized. The more elaborate the processing environment, the more the risk. For example, accuracy improvement in a full-page OCR environment does not have the impact that it does in a data capture environment. In data capture the more you improve, the more you stand a chance of hurting results. The reason for this is improvements made in full-page environments have a more global impact, whereas in a data capture environment improvements are made generally on one-off cases. Although it can be skipped, in data capture environments proper fine-tuning should always be accompanied by regression testing, which is very costly.
So the bottom line is determining when enough is enough is crucial. Because this alone can be a daunting task, especially for immature environments where the metrics simply do not exist, it can simply be a business decision. When is the organization satisfied with accuracy? Usually this is a measure of how much manual labor is an organization willing to apply to the imaging processes. Once an environment has matured and statistics developed, this number can be revisited, and looked at as a goal in cost reduction from what has already been put in-place. In either case the process and establishment of this criteria is important, or the possibility of 100% accuracy can be a target you’re simply throwing money at.
#ScanningandCapture #ROI #OCR #accuracy