Posted on February 3, 2009 | Category: Intrepid Ibex, ocr
A minor disaster the other day: my trusty Acer notebook died. I say minor disaster as my HP Pavillion is as happy as a pig in flight with Intrepid Ibex. However, the Acer still had xp on a partition, and on that xp was Optical Character Recognition software, which was ancient but still did the trick. I need this primarily for the website Patrick Chapman and I founded a while back, Irish Literary Revival. In truth the site has been neglected for a while, but both Patrick and I had discussed additions and I was half way through scanning Heather Brett‘s first book, Abigail Brown.
Anyway, it’s forced me to look at Linux solutions for OCR, and the only real runner that I know of is ocropus, the Google-sponsored open source document analysis and OCR system. I’ve downloaded ocropus-0.3.1.tar.gz, but the Google wiki Documentation for installation on Ubuntu is for 0.5, and looks very complicated, so I’m going to bookmark nubae’s Habari | Linux and Education piece on ocrupus as not only does it look simpler, but it details a bug with regard to Intrepid Ibex
Tesseract source has a bug that doesn’t allow it to compile with gcc 4.3 (Intrepid Ibex comes with this default)
I haven’t time to play with it for the next while, but I’ll document my adventures here when I do.
» Filed Under Intrepid Ibex, ocr
June 17th, 2009 at 8:40 pm
There are some good reference articles on DocumentLab