Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

Maria Leonor Pacheco, Tunazzina Islam, Lyle Ungar, Ming Yin, Dan Goldwasser


Abstract
Experts across diverse disciplines are often interested in making sense of large text collections. Traditionally, this challenge is approached either by noisy unsupervised techniques such as topic models, or by following a manual theme discovery process. In this paper, we expand the definition of a theme to account for more than just a word distribution, and include generalized concepts deemed relevant by domain experts. Then, we propose an interactive framework that receives and encodes expert feedback at different levels of abstraction. Our framework strikes a balance between automation and manual coding, allowing experts to maintain control of their study while reducing the manual effort required.
Anthology ID:
2023.findings-acl.313
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5059–5080
Language:
URL:
https://aclanthology.org/2023.findings-acl.313
DOI:
10.18653/v1/2023.findings-acl.313
Bibkey:
Cite (ACL):
Maria Leonor Pacheco, Tunazzina Islam, Lyle Ungar, Ming Yin, and Dan Goldwasser. 2023. Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5059–5080, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections (Pacheco et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/2023.findings-acl.313.pdf
Video:
 https://preview.aclanthology.org/ingest-bitext-workshop/2023.findings-acl.313.mp4