Abstract
The aim of vocabulary inventory prediction is to predict a learner’s whole vocabulary based on a limited sample of query words. This paper approaches the problem starting from the 2-parameter Item Response Theory (IRT) model, giving each word in the vocabulary a difficulty and discrimination parameter. The discrimination parameter is evaluated on the sub-problem of question item selection, familiar from the fields of Computerised Adaptive Testing (CAT) and active learning. Next, the effect of the discrimination parameter on prediction performance is examined, both in a binary classification setting, and in an information retrieval setting. Performance is compared with baselines based on word frequency. A number of different generalisation scenarios are examined, including generalising word difficulty and discrimination using word embeddings with a predictor network and testing on out-of-dataset data.- Anthology ID:
- 2021.ranlp-1.134
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 1188–1195
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.134
- DOI:
- Cite (ACL):
- Frankie Robertson. 2021. Word Discriminations for Vocabulary Inventory Prediction. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1188–1195, Held Online. INCOMA Ltd..
- Cite (Informal):
- Word Discriminations for Vocabulary Inventory Prediction (Robertson, RANLP 2021)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2021.ranlp-1.134.pdf
- Code
- frankier/vocabirt