Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Having a predefined questionnaire, how to write system to extract data.

There is an extremely inefficient process in my city office. There is a process of collecting a data from citizens each year, there is an online form and offline/paper form. The paper is a problem:

  1. The forms are given to the people.
  2. People fill the forms, it’s handwriting, and return it to the office.
  3. The clerks have about 2-4 weeks to type the forms into the system.
  4. There is a control data in the form, if incorrect, the form is ignored in further processing.

There are about 15-25K paper forms each year, the graphics and content changes yearly.

I have a template of this year’ form. It’s one page A4. There are two types of information we want to extract: small boxes for a single digit and free text boxes (can contain any text). I don’t have samples of data, but can generate few.

The forms contain sensitive data, cannot be processed outside the internal network. How would you approach such a problem? I would appreciate any help.

Usually I would just go with Google Vision API and text extraction and later writing decision tree to classify bounding boxes as a pieces of information, but in this case I cannot use external services.

This is a non-profit project. If I cannot solve it, they will just hand type it.

submitted by /u/janiedebica
[link] [comments]