I get asked a lot on twitter, “what is the best OCR for Linux?”, usually there is “free” in there somewhere as well. The fact is, although of the top 4 commercial OCR engines, 3 have Linux versions; the accuracy compared to their Windows counterpart is substantially less. However, there is a solution.
There is a very logical reason for the lack of non-Windows OCR engines. When these engines are developed, they are developed using Windows development environments. I’m not assuming you are a developer, but if you are and have experience “porting” from one platform to another, you know it’s slow and not fun. Due to this complexity, and the overall lack of demand, it simply does not make sense to keep non Windows versions of the product as current as Windows versions. On average a Windows equivalent will be 3 or more versions ahead of any Mac, or Linux counterpart. This means big changes in accuracy, stability, and core-functionality.
Therefore the latest and greatest, when it comes to document imaging, is only Windows. But that does not lock the rest of the world out! The solution is simple, bring in an un-manned, set it and forget it Windows machine to do the OCR dirty work. I find that it often comes down to simple pride, or very stick security rules, why an organization is unwilling to have a Windows Box for the sole purpose of OCR. If you get past security issues, there should always be a way, and swallow your penguin pride, it’s a great solution.
So what happens in an environment that demands accurate document conversion but is not a Windows based system? Not all is lost. While in a perfect world all the latest technology would be on your platform of choice, sometimes you have to make exceptions, and this is not a big one to be made. Because document conversion and compression products are all designed to have a mode where they run unmanned it is possible to utilize the technology on a Windows machine but drive it from ANY other platform. Once configured properly the stability of a dedicated document conversion machine is very good. They require low maintenance and very little interaction. Simply, by networking folders for all other machines to see, no matter the platform, you can from any network device transfer images to your document conversion machine and download results.
To build an OCR engine to today’s standards takes about 50 man-years. The recent developments of OCR that is not a part of the top four engines themselves are building on even older engines just to avoid this barrier entry. Therefore, I don't foresee soon technology on other platforms that is at the level of Windows machines. However, what I do know is there is no reason NOT to leverage the most advanced technology with a method of set it and forget it automated document conversion machines. A properly designed cross-platform system works wonders!
#OCR #platform #Windows #ScanningandCapture