Comparison of optical character recognition software

申明:本文非笔者原创,原文转载自:http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software


This comparison of optical character recognition software includes:

  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discoverysystems, records management solutions)
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes
Tesseract 1985 3.02 Oct 2012 Apache No Yes Yes Yes Yes C++, C Yes 35+[1] ? Text,hOCR,[2]others with different user interfaces[3]or the API Created by Hewlett-Packard; under further development by Google[4] It was one of the top 3 engines in the 1995 UNLV Accuracy test.
ExperVision[5]TypeReader & RTK 1987 7.1.170.1125 2010 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 21 2618   Won the highest marks in the independent testing performed byUNLV for X consecutive years (in 1994).[6][citation needed]


The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[7] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats."[8] PC Magazine

ABBYY FineReader 1989 11 2011 Proprietary Yes Yes Yes Yes Yes C/C++ Yes 198[9] ? DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[10] ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[11]
AnyDoc Software 1989 ? ? Proprietary No Yes No No No VBScript ? ? ?   Works with structured, semi-structured, and unstructured documents.
Aquaforest OCR SDK 2001 1.41 2013 Proprietary Yes[12] Yes No No No C#, VB.NET, ASP.NET Yes 23 OmniFont (Extended Module available, including support for over 100 languages)[13] PDF, PDF/A, RTF, TXT Aquaforest's[14] OCR SDK for .NET[15]enables developers to directly make use of the Aquaforest OCR engine in their own applications and create searchable PDFs, RTF or text files from TIFFs, Bitmaps and Image-Only PDFs.
LEADTOOLS[16] 1990[17] 18.0 2013 Proprietary Yes Yes Yes Yes No C/C++, .NET, Objective-C, Java, JavaScript Yes 56[18] Any printed font PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[19] Supports Latin, Asian, Arabic, and MICR character sets.[16] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[20] ICR (handwritten text recognition) is supported.[21]
CuneiForm/OpenOCR 1996 12 2007 BSD variant No Yes Yes Yes Yes C/C++ Yes 28 Any printed font HTML, hOCR, native, RTF,TeX, TXT[22] Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
Transym OCR 2000 3.3 2011 Proprietary No Yes No No No C#, C/C++, VB, VB.NET Yes 11 ?    
Image to OCR Converter 2010[23] 1.2[24] 2012 Proprietary No Yes No No No C/C++, VB and .NET Command Line 40 ? SearchablePDF, Text-Only PDF, Word, HTML, Text[25] It can read most image formats and pdf files, and can scan images from scanner or camera.[26][27]
SimpleOCR 2002 3.5 2008 Proprietary No Yes No No No ? ? ? ?    
Dynamsoft OCR SDK 2003 8.2 2012 Proprietary Yes Yes No No No C/C++ Yes 40+[28] ? PDF, TXT Dynamsoft is the leading provider of image capture SDKs and version control tools.
OmniPage 2005 18 2011 Proprietary No Yes Yes Yes No C/C++, C#[29] Yes ? ?   Product of Nuance Communications
Microsoft Office OneNote 2007 2007 ? 2007 Proprietary No Yes No No No ? ? ? ?    
FreeOCR ? 4.2 August 2012 Proprietary No Yes No No No ? ? ? ?   [30]
GOCR ? 0.49 2010 GPL Yes[31] Yes Yes Yes Yes C ? ? ?    
Ocrad ? 0.21[32] 2011 GPL Yes Yes Yes Yes Yes C++ Yes Latin alphabet ?   Command line
SmartScore ? ? ? Proprietary No Yes Yes No No ? ? ? ?   For musical scores
Microsoft Office Document Imaging ? Office 2007 2007 Proprietary No Yes No No No ? ? ? ?   Uses OmniPage[citation needed]
Puma.NET ? ? ? BSD No Yes No No No C# Yes 28 Any printed font   .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
ReadSoft ? ? ? Proprietary No Yes No No No ? ? ? ?   Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
Scantron ?Cognition ? ? Proprietary No Yes No No No ? ? ? ?   For working with localized interfaces, corresponding language support is required.
OCRFeeder ? 0.7.11 2009 GPL No No No Yes No Python ? ? ?   Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines likeTesseract or Ocrad
OCRopus ? 0.6 2012 Apache No No No Yes No Python ? ? ? hOCR, HTML, TXT[33] Pluggable framework under active development, used forGoogle Books
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes

你可能感兴趣的:(Comparison of optical character recognition software)