Johannes S. Sax


2026

Dictionaries have to be regularly updated. Some dictionary-makers gather proposals for updates of sense entries in internal databases. We automate the process of verifying and prioritizing such sense proposals, and facilitate their addition to a dictionary, by building a sophisticated processing pipeline relying on state-of-the-art language models. Our pipeline presents the first systematic, large-scale, and comprehensive solution for processing candidates for inclusion in a dictionary, which is tested in an industry-relevant context. We conduct several experiments to evaluate the pipeline and provide an annotated dataset for future work. Model performance is acceptable for words which are not yet in the dictionary, but low for in-dictionary words. Through an error analysis and model component ablation, we gain further insight on directions of future model improvements.