PDF tips

Best OCR Software

PDF tips

Best OCR Software

Many OCR commercial software products don't provide free trial. We only recommend software that we can have a test.

Best Choice: tesseract-ocr

tesseract-ocr is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google. It is released under the Apache License 2.0.

The figure below shows origin image and text that tesseract-ocr exports. Click the figure to enlarge it. The conversion result is bad in this testing.


tesseract-ocr can be downloaded at https://code.google.com/p/tesseract-ocr/

Optional: Microsoft OneNote

I personally like to use Microsoft OneNote's "Copy Text from Picture" function.

You need to insert a picture into OneNote first, then right click on the inserted picture and select "Copy Text from Picture" menu. The copied optically recognized text goes into the clipboard and you can now paste it into any program like Word or Notepad.

The figure below shows origin image and text that Microsoft OneNote exports. Click the figure to enlarge it. The conversion result is better than most professional OCR software.

Microsoft OneNote