Skip to Content

Scanning a book: hardware and software problems

As I am opening this site, one of the most urgent things I have to do is scan a whole book, pass it through an OCR software in order to publish this rare book on the internet.

It is thus that the first few issues in this site are all related to scanning and ORC:
#11: OCR with tesseract: garbage output
#12: tesseract language setting
#13: mass processing TIFF images: GIMP scripts

I managed to make tesseract work.

But now, I need to buy a scanner so that I can scan the whole book. And it will be handy to scan miscellaneous things, now and then.
#17: Which scanner for Ubuntu?

So, not only are the first few issues related to scanning and ORC, but so are the first few wiki pages created on this site:
http://linux.overshoot.tv/wiki/ocr_optical_character_recognition
http://linux.overshoot.tv/wiki/scanners

The book is in French and will be published there:
http://3enjeux.overshoot.tv/

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • Allowed HTML tags: <a> <blockquote> <cite> <code> <div> <em> <h2> <h3> <h4> <h5> <h6> <img> <li> <ol> <pre> <strong> <ul> <table> <th> <td> <tr> <br>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically. (Better URL filter.)
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Project issue numbers (ex. [#12345]) turn into links automatically.
  • Use [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] to insert a mediawiki style collapsible table of contents. All the arguments are optional.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.