Equipping Educational Applications with Domain Knowledge
Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, Suma Bhat
Abstract
One of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e.g., history or science). To address this challenge, we propose a tool, Dexter, that extracts a subject-specific corpus from a heterogeneous corpus, such as Wikipedia, by relying on a small seed corpus and distributed document representations. We empirically show the impact of the generated corpus on language modeling, estimating word embeddings, and consequently, distractor generation, resulting in better performances than while using a general domain corpus, a heuristically constructed domain-specific corpus, and a corpus generated by a popular system: BootCaT.- Anthology ID:
- W19-4448
- Volume:
- Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Torsten Zesch
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 472–477
- Language:
- URL:
- https://aclanthology.org/W19-4448
- DOI:
- 10.18653/v1/W19-4448
- Cite (ACL):
- Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, and Suma Bhat. 2019. Equipping Educational Applications with Domain Knowledge. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 472–477, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Equipping Educational Applications with Domain Knowledge (Sakakini et al., BEA 2019)
- PDF:
- https://preview.aclanthology.org/landing_page/W19-4448.pdf