[R] Active Annotation — Efficient human-in-the-loop annotation methodology
Active Annotation: bootstrapping annotation lexicon and guidelines for supervised NLU learning
We present a data annotation paradigm (Active Annotation), which is designed to aid human annotators by means of unsupervised learning. The idea is to set up an iterative process in which instances to be human-labelled are first selected, clustered and automatically labelled, and then passed to the annotator for the final validation of the proposed label or the assignment of a new label. The approach is integrated in a Web tool providing a user interface designed to be easy to use to maximize annotators’ productivity. The approach is evaluated in a natural language understanding scenario, in which annotators had to label with intent information a dataset of booking conversations. In this scenario, active annotation is compared against a baseline approach in which data are annotated instance-by-instance with a “human-only driven” method (in which annotators have to decide, sentence by sentence, whether to validate, replace or skip an automatically produced label). The reported results indicate the effectiveness of active annotation. First, in separate sessions with the same duration, humans were able to annotate a much larger set of instances compared to the baseline approach. Second, systems trained with data annotated with the proposed active annotation paradigm achieve better performance compared to systems trained with data annotated with the baseline approach.
— Abstract —
Natural Language Understanding (NLU) models are typically trained in a supervised learning framework. In the case of in-tent classiﬁcation, the predicted labels are predeﬁned and based on the designed annotation schema while the labeling process is based on a laborious task where annotators manually inspect each utterance and assign the corresponding label. We propose an Active Annotation (AA) approach where we combine an un-supervised learning method in the embedding space, a human-in-the-loop veriﬁcation process, and linguistic insights to create lexicons that can be open categories and adapted over time. In particular, annotators deﬁne the y-label space on-the-ﬂy during the annotation using an iterative process and without the need for prior knowledge about the input data. We evaluate the proposed annotation paradigm in a real use-case NLU scenario.Results show that our Active Annotation paradigm achieves ac-curate and higher quality training data, with an annotation speed of an order of magnitude higher with respect to the traditional human-only driven baseline annotation methodology.
— Paper Link —
Feel free to ask