Image optimization before OCR
Project: | Linux software |
Component: | Documentation |
Category: | support request |
Priority: | normal |
Assigned: | Unassigned |
Status: | closed |
Related pages: | #10: OCR - optical character recognition |
Description
#17: Which scanner for Ubuntu?
#21: Most Linux friendly scanner manufacturers?
#13: mass processing TIFF images: GIMP scripts
A friend replicated a test OCR from a test scan I did earlier. The text is in French (with accented letters...)
His result is much better than my own: see attached image (Yellow: his test which is better than my test in green).
He said is augmented the contrast.
It shows that it pays to optimize the image before trying to OCR it. This comes back to this ticket:
#13: mass processing TIFF images: GIMP scripts
Attachment | Size |
---|---|
comparaison-ocr.png | 104.87 KB |
Comments
#1
Same image, but more compact.
#2
wiki.
#3
Actually the image in #1 is misleading.
The text is in French and I forgot to add the
-l fra
attribute.Attached is a comparison between the OCR of the same page, with in yellow the original scan (without lang fr) and in green, the optimized result with the proper language setting.
#4
Attached is the actual comparison between the tif file I originally used (yellow), and mose's optimized tif file (green).
#5
I finished scanning the book. Overall, what made the most difference was : the dpi setting (3 and not more!), and the language setting (don't forget -l fra).
Anyhow, I documented what I could. I finished scanning and OCR'ing what I had.
#6
Automatically closed -- issue fixed for 2 weeks with no activity.