OCRopus is an OCR system. I initially wanted to see how it can handle handwriting. So, I gave it a try by installing it on Ubuntu 8.04. To get started, I used Synaptic to install the following required software:
- jam
- libpng12-dev
- libjpeg-dev
- libtiff-dev
I installed one of the optional packages, libaspell-dev. Beyond that, I also installed build-essentials for the compilers needed to build from source.
Next, I checked out tesseract-ocr from Google.
svn co http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
I had to use the patch from this forum thread. To install the patch, I used this command:
patch java/makefile < java
Note that java is the patch file and java/makefile is the make file in tesseract-ocr/java directory. After I applied the patch, I continued building tesseract-ocr
./configure make sudo make install
Now I have all the required software, now I am ready to install ocropus:
svn co http://ocropus.googlecode.com/svn/trunk/ cd trunk ./configure jam sudo jam install
By this step, the basic ocropus is installed.
One thing I noticed after the initial install, I needed to create /usr/local/ocroscript directory and create the following two soft links within the newly created directory.:
ocroscript -> ../bin/ocroscript scripts -> ../share/ocropus/script
To test the software, I used the sample image came with the software:
/usr/local/bin/ocrocmd /data/pages/alice_1.png |less
The default test case above worked for me. Next, I took out my camera and took a picture of my handwriting. Upload the image and ran it through the OCR software. I was disappointed to find that ocropus couldn’t recognize my handwriting very well. Is there something that can do better?
