Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[R] Active Annotation — Efficient human-in-the-loop annotation methodology

Active Annotation: bootstrapping annotation lexicon and guidelines for supervised NLU learning

We present a data annotation paradigm (Active Annotation), which is designed to aid human annotators by means of unsupervised learning. The idea is to set up an iterative process in which instances to be human-labelled are first selected, clustered and automatically labelled, and then passed to the annotator for the final validation of the proposed label or the assignment of a new label. The approach is integrated in a Web tool providing a user interface designed to be easy to use to maximize annotators’ productivity. The approach is evaluated in a natural language understanding scenario, in which annotators had to label with intent information a dataset of booking conversations. In this scenario, active annotation is compared against a baseline approach in which data are annotated instance-by-instance with a “human-only driven” method (in which annotators have to decide, sentence by sentence, whether to validate, replace or skip an automatically produced label). The reported results indicate the effectiveness of active annotation. First, in separate sessions with the same duration, humans were able to annotate a much larger set of instances compared to the baseline approach. Second, systems trained with data annotated with the proposed active annotation paradigm achieve better performance compared to systems trained with data annotated with the baseline approach.

— Abstract —

Natural Language Understanding (NLU) models are typically trained in a supervised learning framework. In the case of in-tent classification, the predicted labels are predefined and based on the designed annotation schema while the labeling process is based on a laborious task where annotators manually inspect each utterance and assign the corresponding label. We propose an Active Annotation (AA) approach where we combine an un-supervised learning method in the embedding space, a human-in-the-loop verification process, and linguistic insights to create lexicons that can be open categories and adapted over time. In particular, annotators define the y-label space on-the-fly during the annotation using an iterative process and without the need for prior knowledge about the input data. We evaluate the proposed annotation paradigm in a real use-case NLU scenario.Results show that our Active Annotation paradigm achieves ac-curate and higher quality training data, with an annotation speed of an order of magnitude higher with respect to the traditional human-only driven baseline annotation methodology.

— Paper Link —

Feel free to ask

submitted by /u/feedmari
[link] [comments]