Posts Tagged ocr

First impression with ocropus

OCRopus is an OCR system. I initially wanted to see how it can handle handwriting. So, I gave it a try by installing it on Ubuntu 8.04. To get started, I used Synaptic to install the following required software:

  • jam
  • libpng12-dev
  • libjpeg-dev
  • libtiff-dev

I installed one of the optional packages, libaspell-dev. Beyond that, I also installed build-essentials for the compilers needed to build from source.

Next, I checked out tesseract-ocr from Google.

svn co http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr

I had to use the patch from this forum thread. To install the patch, I used this command:

patch java/makefile < java

Note that java is the patch file and java/makefile is the make file in tesseract-ocr/java directory. After I applied the patch, I continued building tesseract-ocr

./configure
make
sudo make install

Now I have all the required software, now I am ready to install ocropus:

svn co http://ocropus.googlecode.com/svn/trunk/
cd trunk
./configure
jam
sudo jam install

By this step, the basic ocropus is installed.

One thing I noticed after the initial install, I needed to create /usr/local/ocroscript directory and create the following two soft links within the newly created directory.:

ocroscript -> ../bin/ocroscript
scripts -> ../share/ocropus/script

To test the software, I used the sample image came with the software:

/usr/local/bin/ocrocmd /data/pages/alice_1.png |less

The default test case above worked for me. Next, I took out my camera and took a picture of my handwriting. Upload the image and ran it through the OCR software. I was disappointed to find that ocropus couldn’t recognize my handwriting very well. Is there something that can do better?

Comments (2)