Equipping Educational Applications with Domain Knowledge

Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, Suma Bhat


Abstract
One of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e.g., history or science). To address this challenge, we propose a tool, Dexter, that extracts a subject-specific corpus from a heterogeneous corpus, such as Wikipedia, by relying on a small seed corpus and distributed document representations. We empirically show the impact of the generated corpus on language modeling, estimating word embeddings, and consequently, distractor generation, resulting in better performances than while using a general domain corpus, a heuristically constructed domain-specific corpus, and a corpus generated by a popular system: BootCaT.
Anthology ID:
W19-4448
Volume:
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
472–477
Language:
URL:
https://aclanthology.org/W19-4448
DOI:
10.18653/v1/W19-4448
Bibkey:
Cite (ACL):
Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, and Suma Bhat. 2019. Equipping Educational Applications with Domain Knowledge. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 472–477, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Equipping Educational Applications with Domain Knowledge (Sakakini et al., BEA 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/W19-4448.pdf