Oh yeah, there are two types of OCR

By Chris Riley, ECMp, IOAp posted 08-24-2010 09:24


I forget only long enough until someone reminds me.  It usually goes like this.  I discuss OCR on some online forum, or via some method of search someone seeking OCR topics finds an article of mine and comments.  After a little back and forth we find out, we are not at all talking about the same thing.  The topic is OCR how off could we be.

What I’m neglecting is that there are actually two distinctly different types of OCR; software based and machine based (or inline). While the core algorithms share a lot, the similarity stops there.  Both technologies are used on very different types of text and have very different ways to tune. If you are going to get technical both technologies are really software based OCR, but it comes down to WHEN OCR happens. 

For inline OCR, it is done at scan time, and very often not on documents rather on objects going down an assembly line.  In-line OCR for documents is used primarily for mail-room processing on high speed high volume scanners, or on manufacturing assembly lines. Both scenarios need data from the input asset quickly. The benefits of in-line OCR is it's the fastest OCR around. Usually the OCR is a part of firmware, and optimized for speed. If you imagine an assembly line of bottles, the bottles pass the camera at millisecond time. To wait for OCR would be a huge bottleneck in the quality control and inventory process. The downside to in-line OCR is accuracy. Usually in the case of the assembly line the engine has been so tuned that it is extremely accurate for a single image type. Where accuracy is proven to be less, is when it comes to document scanning, the digital mail-room. In the digital mail-room the in-line OCR, in order to be as fast as it is, must be an engine that is reduced in complexity, namely removing document analysis and reading of complex fonts. Because of this when documents are scanned the accuracy cannot compare to that of PC based OCR.

PC based OCR has the benefit of scalability. It can work on the widest range of document types. Furthermore, because it's using the PC it has the latest and greatest technologies that work on degraded documents and complex documents. The downside of PC based OCR is that it's not as fast as in-line. 99.9% it is fast enough. Many times PC based OCR is used at the document scanner rated speed of 60 pages a minute. This is plenty fast for those who's primary concern is quality. It is not fast enough for machine to machine hand-off's, but this is not its primary use.

No you may never encounter in-line OCR, but knowing about the technology will help you understand the world of recognition and applications of the technology.

#ScanningandCapture #OCR #inline