Someone gave me a 16 page printed specification document for the format of a data file. I’m too slow and lazy to type all that in manually. I tried a few free OCR (optical character recognition) programs (Gocr and SimpleOCR) but the results were absolutely terrible. High error rates and the formatting was all wrong. Commercial OCR programs are too expensive for occasional use. Microsoft removed their OCR program from Office. So I scanned the documents in as jpegs and uploaded them to Amazon’s S3. I created a job on Mechanical Turk to type in each page for 50 cents. Submitted the jobs at night, woke up the next morning and all jobs have been completed. The quality is quite good, though I’ve found a few missing rows. I’ll definitely use Turk for large tedious jobs in the future. However, this morning I tried an online OCR service called ocrNow! My one sample document was recognized perfectly, and it put the result in a perfectly formatted Excel document. It costs 2GBP ($3) for 20 documents, which is 15 cents per document. And the results are delivered in seconds. Unfortunately, the machines have won another round against humans. Stay in school, kids.


