José María Hoya Quecedo

Also published as: José María Hoya Quecedo

2020

pdf abs
Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages
José María Hoya Quecedo | Koppatz Maximilian | Roman Yangarber
Proceedings of the Twelfth Language Resources and Evaluation Conference

We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser—with no manual disambiguation or data annotation. We assume that the morphological analyser produces multiple analyses for ambiguous words. The idea is to train recurrent neural networks on the output that the morphological analyser produces for unambiguous words. We present performance on POS and lemma disambiguation that reaches or surpasses the state of the art—including supervised models—using no manually annotated data. We evaluate the method on several morphologically rich languages.

2019

pdf abs
Modeling language learning using specialized Elo rating
Jue Hou | Koppatz Maximilian | José María Hoya Quecedo | Nataliya Stoyanova | Roman Yangarber
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automatic assessment of the proficiency levels of the learner is a critical part of Intelligent Tutoring Systems. We present methods for assessment in the context of language learning. We use a specialized Elo formula used in conjunction with educational data mining. We simultaneously obtain ratings for the proficiency of the learners and for the difficulty of the linguistic concepts that the learners are trying to master. From the same data we also learn a graph structure representing a domain model capturing the relations among the concepts. This application of Elo provides ratings for learners and concepts which correlate well with subjective proficiency levels of the learners and difficulty levels of the concepts.

pdf abs
Projecting named entity recognizers without annotated or parallel corpora
Jue Hou | Maximilian Koppatz | José María Hoya Quecedo | Roman Yangarber
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Named entity recognition (NER) is a well-researched task in the field of NLP, which typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer with no annotated or parallel documents, by leveraging strong NER models that exist for English. We automatically gather a large amount of chronologically matched data in two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with limited linguistic rules. We use this “artificially” annotated data to train a BiLSTM-CRF model. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance.