Image optimization before OCR

Tue, 04/20/2010 - 01:22 - augustin

Attachment	Size
comparaison-ocr.png	104.87 KB

Comments

#1

augustin - 04/20/2010 - 01:29

Same image, but more compact.

Attachment	Size
comparaison_ocr.png	128.59 KB

#2

augustin - 04/20/2010 - 01:37

+10: OCR - optical character recognition

wiki.

#3

augustin - 04/29/2010 - 09:30

Actually the image in #1 is misleading.
The text is in French and I forgot to add the -l fra attribute.

Attached is a comparison between the OCR of the same page, with in yellow the original scan (without lang fr) and in green, the optimized result with the proper language setting.

Attachment	Size
tesserract_diff_without_and_with_lang_fr.png	111.21 KB

#4

augustin - 04/29/2010 - 09:33

Attached is the actual comparison between the tif file I originally used (yellow), and mose's optimized tif file (green).

Attachment	Size
tesserract_diff_from_mo_au_source.png	65.13 KB

#5

augustin - 05/31/2010 - 13:25

Status:

active

» fixed

I finished scanning the book. Overall, what made the most difference was : the dpi setting (3 and not more!), and the language setting (don't forget -l fra).

Anyhow, I documented what I could. I finished scanning and OCR'ing what I had.

#6

robot - 06/14/2010 - 17:10

Status:	fixed	» closed
Related pages:	-10: OCR - optical character recognition

Automatically closed -- issue fixed for 2 weeks with no activity.

Project:	Linux software
Component:	Documentation
Category:	support request
Priority:	normal
Assigned:	Unassigned
Status:	closed
Related pages:	#10: OCR - optical character recognition

User login

Tickets per project

Image optimization before OCR

Jump to:

Comments

#1

#2

#3

#4

#5

#6

Who's online