Main content

tips

Cleaning up OCR'd PDFs fix

I had a problem recent with a set of PDF'd text that I needed a fix for. I thought I'd share the problem with the solution...When PDFs get digitized they often retain elements of their paragraph formatting. The width of the paragraph is converted into paragraph breaks, when in fact it is not a real paragraph break. This does not appear to cause any problems with text to speech, however it would cause a problem when viewing the title as an eBook, especially an eBook with enlarged text.

Subscribe to RSS - tips