2023
pdf
SwissBERT: The Multilingual Language Model for Switzerland
Jannis Vamvas
|
Johannes Graën
|
Rico Sennrich
Proceedings of the 8th edition of the Swiss Text Analytics Conference
2021
pdf
bib
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
David Alfter
|
Elena Volodina
|
Ildikó Pilan
|
Johannes Graën
|
Lars Borin
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
2020
pdf
abs
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications
Johannes Graën
|
David Alfter
|
Gerold Schneider
Proceedings of the Twelfth Language Resources and Evaluation Conference
The Common European Framework of Reference for Languages (CEFR) defines six levels of learner proficiency, and links them to particular communicative abilities. The CEFRLex project aims at compiling lexical resources that link single words and multi-word expressions to particular CEFR levels. The resources are thought to reflect second language learner needs as they are compiled from CEFR-graded textbooks and other learner-directed texts. In this work, we investigate the applicability of CEFRLex resources for building language learning applications. Our main concerns were that vocabulary in language learning materials might be sparse, i.e. that not all vocabulary items that belong to a particular level would also occur in materials for that level, and, on the other hand, that vocabulary items might be used on lower-level materials if required by the topic (e.g. with a simpler paraphrasing or translation). Our results indicate that the English CEFRLex resource is in accordance with external resources that we jointly employ as gold standard. Together with other values obtained from monolingual and parallel corpora, we can indicate which entries need to be adjusted to obtain values that are even more in line with this gold standard. We expect that this finding also holds for the other languages
2019
pdf
abs
Interconnecting lexical resources and word alignment: How do learners get on with particle verbs?
David Alfter
|
Johannes Graën
Proceedings of the 22nd Nordic Conference on Computational Linguistics
In this paper, we present a prototype for an online exercise aimed at learners of English and Swedish that serves multiple purposes. The exercise allows learners of the aforementioned languages to train their knowledge of particle verbs receiving clues from the exercise application. The user themselves decide which clue to receive and pay in virtual currency for each, which provides us with valuable information about the utility of the clues that we provide as well as the learners willingness to trade virtual currency versus accuracy of their choice. As resources, we use list with annotated levels from the proficiency scale defined by the Common European Framework of Reference (CEFR) and a multilingual corpus with syntactic dependency relations and word annotation for all language pairs. From the latter resource, we extract translation equivalents for particle verb construction together with a list of parallel corpus examples that can be used as clues in the exercise.
2018
pdf
NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills
Gerold Schneider
|
Johannes Graën
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
2017
pdf
Multilingwis² – Explore Your Parallel Corpus
Johannes Graën
|
Dominique Sandoz
|
Martin Volk
Proceedings of the 21st Nordic Conference on Computational Linguistics
pdf
Exploring Properties of Intralingual and Interlingual Association Measures Visually
Johannes Graën
|
Christof Bless
Proceedings of the 21st Nordic Conference on Computational Linguistics
pdf
Crossing the border twice: Reimporting prepositions to alleviate L1-specific transfer errors
Johannes Graën
|
Gerold Schneider
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
2014
pdf
abs
Innovations in Parallel Corpus Search Tools
Martin Volk
|
Johannes Graën
|
Elena Callegaro
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.