Automatic concept extraction for learning domain modeling: A weakly supervised approach using contextualized word embeddings

Kordula De Kuthy, Leander Girrbach, Detmar Meurers


Abstract
Heterogeneity in student populations poses achallenge in formal education, with adaptivetextbooks offering a potential solution by tai-loring content based on individual learner mod-els. However, creating domain models for text-books typically demands significant manual ef-fort. Recent work by Chau et al. (2021) demon-strated automated concept extraction from dig-ital textbooks, but relied on costly domain-specific manual annotations. This paper in-troduces a novel, scalable method that mini-mizes manual effort by combining contextu-alized word embeddings with weakly super-vised machine learning. Our approach clustersword embeddings from textbooks and identi-fies domain-specific concepts using a machinelearner trained on concept seeds automaticallyextracted from Wikipedia. We evaluate thismethod using 28 economics textbooks, com-paring its performance against a tf-idf baseline,a supervised machine learning baseline, theRAKE keyword extraction method, and humandomain experts. Results demonstrate that ourweakly supervised method effectively balancesaccuracy with reduced annotation effort, offer-ing a practical solution for automated conceptextraction in adaptive learning environments.
Anthology ID:
2025.bea-1.13
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
175–185
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.13/
DOI:
Bibkey:
Cite (ACL):
Kordula De Kuthy, Leander Girrbach, and Detmar Meurers. 2025. Automatic concept extraction for learning domain modeling: A weakly supervised approach using contextualized word embeddings. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 175–185, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Automatic concept extraction for learning domain modeling: A weakly supervised approach using contextualized word embeddings (De Kuthy et al., BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.13.pdf