Hand-print or Handwriting, HUGE difference!

By Chris Riley, ECMp, IOAp posted 02-04-2011 13:32


Here is another roll-up your sleeves down and dirty post.  When it comes to forms processing and data capture working with documents that have hand-print vs. handwriting is a huge difference.  Hand-printed documents can be processed with great automation and the technology can benefit all organizations encountering them, while hand written documents need special consideration.  A common habit and misconception from the user community is that they are the same.  So how do you tell if your form is hand-print, or handwriting, or better yet both!

ICR (Intelligent Character Recognition) is the algorithm used in the place of OCR for characters generated by a human hand. The algorithm is more dynamic as a persons hand-print changes slightly by the minute. It's possible to be very accurate when processing hand-print forms when the form is designed correctly. When doing this type of forms processing you will always have quality assurance steps, but you can get close to the accuracy of any OCR process. Very often forms that were not created with data capture or automated extraction in mind will contain handwriting. The reason for this is that hand-print is usually guided by the form itself. Forms without hand-print cannot expect to be processed at a high accuracy. Below are the defining characteristics of hand-print.

Mono-spaced text: What this means that each character as it's filled out is the same distance apart as all the other characters. In handwriting very often you will have characters that connect, in the extreme form this is cursive. When characters touch or are not spread out equally you get improper segmentation and get characters clumped together as one or split in half during recognition. Mono-spaced text is usually achieved using boxes on the form guiding the user to fill within the boxes.

Uniform Height and Width: Similar to mono-spaced text the text as it is filled in should have a more or less uniform height or width. This forces the completer not to introduce as many variable elements as they would in straight handwriting and increases accuracy. This is also accomplished using boxes that limit a completer’s variation character by character.

Uniform Base-Line: This aspect of hand-print is the lessor thought about but very important. Text must always be on the same horizontal base-line. What happens typically in handwriting is a user varies up and down on an invisible baseline. You may have noticed sometimes when you write that the end of any line is lower than the beginning. Baselines are important for OCR and ICR to get proper character segmentation and recognition of a few key characters such as “q” and “p” the “tail” characters.

Sans-serif: The last element is keeping characters sans-serif. The reason for this is the extra tails to characters can cause confusion between certain characters like “o” vs. “q” and “c” vs. “e”. The way to achieve this is less obvious, it is by putting a guide on the top of the form that shows a good character and a bad character.

Today handwriting and cursive automation is not complete and usually only successful when augmented with other technologies such as data base lookup and CAR and LAR.  There are several large processing environments using Intelligent Word Recognition, which leverages the same adaptive techniques of ICR and combines it with very robust dictionary lookups to read handwriting.  The greatest success in such projects is where the type of input is very clearly defined.

#handwriting #OCR #ICR #Hand-print #ScanningandCapture