[D] Current state of the Topic Segmentation problem
Recently, I did a little research in the literature for “Topic segmentation” since “Text segmentation” seems to be more related to identifying text in images. From the results, it appears that the most recent survey is from 2011 [1], while the most recent papers in big conferences are from 2008 to 2013 [2, 3, 4].
Is this the current state of the problem, or there are more recent and relevant works?
It’s also possible that I’m using the wrong terms. So, for clarification, I’m most interest in segmenting a collection of documents in a small and well-known number of sections / topics.
[1] Purver, Matthew. “Topic segmentation.” In Spoken language understanding: systems for extracting semantic information from speech (2011)
[2] Eisenstein, Jacob, and Regina Barzilay. “Bayesian unsupervised topic segmentation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2008)
[3] Riedl, Martin, and Chris Biemann. “TopicTiling: a text segmentation algorithm based on LDA.” In Proceedings of ACL 2012 Student Research Workshop (2012)
[4] Du, Lan, Wray Buntine, and Mark Johnson. “Topic segmentation with a structured topic model.” In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics (2013)
submitted by /u/Daango_
[link] [comments]