open-source character recognition

Index| Download| Screenshots| Examples| Developers| Support| Links

This page is still under construction!

This is an overview, made mainly for developers. You can find typical example files made by me and sent by others. You can pick a example and try to improve gocr. Note that the other examples should be recognized like before your changes or even better.

Users can get a first impression here of how well gocr works.

  1. excellent examples
  2. good examples
  3. unsorted examples (from other people)
Only errors, where good readable characters are not recognized, or completely wrong recognized are counted. Errors where a "0" (zero) is printed as "O" (like Omega) is not counted. Sometimes missing accents are also not counted. Take it as a rough overview. It is also counted by hand, so it is not very exact.

excellent examples

pbm=black/white, clean scans

testfile         ----size----   num_c num_errors  remarks
--------         kB   x    y    wc    g027        
font1.pbm.gz     40 1623  953  905   7        build by latex+gs 300dpi, 12pt, mixed fonts
font2.pbm.gz     24 1083  637  905  14        build by latex+gs 200dpi, 12pt, mixed fonts

old overview:

testfile      size   num_c quality  time num_errors  remarks
--------      x    y       -------  p1   p1 p2 p3
g300a1.pbm  703  580   469 +        2s    4 -   0
g300a2.pbm  724 1252  1021 +        5s    2 0   0
g300b1.pbm 1564  277    55 +        4s    0 1   1
g300b2.pbm  599 1319   860 snowy    9s   76 -  40
g300b3.pbm  592 1324   934 snowy    7s   36 2  15
g300c1.pbm  750 2771  2182 +       15s   35 1  14
liebfrau1  2289 3200  1927 +       19s   13 0   8
meraji1    1912 1355  1246 thinn   15s   65 4  40
paraguay1  2617 1375  3280 frame   78s 1000 1  55

p1=gocr0.2.4a3 on P400

most errors: connected chars, like fi,ff, italic font

unsorted examples
