[D] Text detection – recognition – extraction
For a project, I need to get all the text off an image, in a structured format (sentences, paragraphs, etc.), and have it be accurate.
Most of my experiments have dealt with scene detection, which usually just detects text being there in a non structured. The out of the box OCR engines don’t seem to be accurate, as I’m hoping to run some NLP on top of the extracted data.
An idea I had was detecting sentences and paragraphs of text, cropping and OCRing the data until there is no more text on the page, but I found that text recognition isn’t that far along yet.
I’m looking for any help going forward, and hopefully come up with an end to end solution for this.
submitted by /u/cashshots
[link] [comments]