2024
pdf
abs
Exploring Text Classification for Enhancing Digital Game-Based Language Learning for Irish
Leona Mc Cahill
|
Thomas Baltazar
|
Sally Bruen
|
Liang Xu
|
Monica Ward
|
Elaine Uí Dhonnchadha
|
Jennifer Foster
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Digital game-based language learning (DGBLL) can help with the language learning process. DGBLL applications can make learning more enjoyable and engaging, but they are difficult to develop. A DBGLL app that relies on target language texts obviously needs to be able to use texts of the appropriate level for the individual learners. This implies that text classification tools should be available to DGBLL developers, who may not be familiar with the target language, in order to incorporate suitable texts into their games. While text difficulty classifiers exist for many of the most commonly spoken languages, this is not the case for under-resourced languages, such as Irish. In this paper, we explore approaches to the development of text classifiers for Irish. In the first approach to text analysis and grading, we apply linguistic analysis to assess text complexity. Features from this approach are then used in machine learning-based text classification, which explores the application of a number of machine learning algorithms to the problem. Although the development of these text classifiers is at an early stage, they show promise, particularly in a low-resourced scenario.
pdf
bib
abs
Empowering Adaptive Digital Game-Based Language Learning for Under-Resourced Languages Through Text Analysis
Elaine Uí Dhonnchadha
|
Sally Bruen
|
Liang Xu
|
Monica Ward
Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024
This study explores Cipher, an adaptive language learning game tailored for the under-resourced Irish language, aimed mainly at primary school students. By integrating text analysis techniques, Cipher dynamically adjusts its difficulty based on the player’s language proficiency, offering a customised learning experience. The game’s narrative involves decoding spells to access Irish myths and stories, combining language learning with cultural elements. Development involved collaboration with educators to align the game content with curriculum standards and incorporate culturally relevant materials. This paper outlines the game’s development process, emphasising the use of text analysis for difficulty adjustment and the importance of engaging, educational gameplay. Preliminary results indicate that adaptive games like Cipher can enhance language learning by providing immersive, personalised experiences that maintain player motivation and engagement.
pdf
abs
Towards Semantic Tagging for Irish
Tim Czerniak
|
Elaine Uí Dhonnchadha
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Well annotated corpora have been shown to have great value, both in linguistic and non-linguistic research, and in supporting machine-learning and many other non-research activities including language teaching. For minority languages, annotated corpora can help in understanding language usage norms among native and non-native speakers, providing valuable information both for lexicography and for teaching, and helping to combat the decline of speaker numbers. At the same time, minority languages suffer from having fewer available language resources than majority languages, and far less-developed annotation tooling. To date there is very little work in semantic annotation for Irish. In this paper we report on progress to date in the building of a standard tool-set for semantic annotation of Irish, including a novel method for evaluation of semantic annotation. A small corpus of Irish language data has been manually annotated with semantic tags, and manually checked. A semantic type tagging framework has then been developed using existing technologies, and using a semantic lexicon that has been built from a variety of sources. Semantic disambiguation methods have been added with a view to increasing accuracy. That framework has then been tested using the manually tagged corpus, resulting in over 90% lexical coverage and almost 80% tag accuracy. Development is ongoing as part of a larger corpus development project, and plans include expansion of the manually tagged corpus, expansion of the lexicon, and exploration of further disambiguation methods. As the first semantic tagger for Irish, to our knowledge, it is hoped that this research will form a sound basis for semantic annotation of Irish corpora in to the future.
2023
pdf
abs
DCU/TCD-FORGe at WebNLG’23: Irish rules! (WegNLG 2023)
Simon Mille
|
Elaine Uí Dhonnchadha
|
Stamatia Dasiopoulou
|
Lauren Cassidy
|
Brian Davis
|
Anya Belz
Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)
In this paper, we describe the submission of Dublin City University (DCU) and Trinity College Dublin (TCD) for the WebNLG 2023 shared task. We present a fully rule-based pipeline for generating Irish texts from DBpedia triple sets which comprises 4 components: triple lexicalisation, generation of noninflected Irish text, inflection generation, and post-processing.
pdf
abs
Generating Irish Text with a Flexible Plug-and-Play Architecture
Simon Mille
|
Elaine Uí Dhonnchadha
|
Lauren Cassidy
|
Brian Davis
|
Stamatia Dasiopoulou
|
Anya Belz
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
In this paper, we describe M-FleNS, a multilingual flexible plug-and-play architecture designed to accommodate neural and symbolic modules, and initially instantiated with rule-based modules. We focus on using M-FleNS for the specific purpose of building new resources for Irish, a language currently under-represented in the NLP landscape. We present the general M-FleNS framework and how we use it to build an Irish Natural Language Generation system for verbalising part of the DBpedia ontology and building a multilayered dataset with rich linguistic annotations. Via automatic and human assessments of the output texts we show that with very limited resources we are able to create a system that reaches high levels of fluency and semantic accuracy, while having very low energy and memory requirements.
2022
pdf
abs
How NLP Can Strengthen Digital Game Based Language Learning Resources for Less Resourced Languages
Monica Ward
|
Liang Xu
|
Elaine Uí Dhonnchadha
Proceedings of the 9th Workshop on Games and Natural Language Processing within the 13th Language Resources and Evaluation Conference
This paper provides an overview of the Cipher engine which enables the development of a Digital Educational Game (DEG) based on noticing ciphers or patterns in texts. The Cipher engine was used to develop the Cipher: Faoi Gheasa, a digital educational game for Irish, which incorporates NLP resources and is informed by Digital Game-Based Language Learning (DGBLL) and Computer-Assisted Language Learning (CALL) research. The paper outlines six phases where NLP has strengthened the Cipher: Faoi Gheasa game. It shows how the Cipher engine can be used to build a Cipher game for other languages, particularly low-resourced and endangered languages in which NLP resources are under-developed or few in number.
pdf
abs
Cipher – Faoi Gheasa: A Game-with-a-Purpose for Irish
Elaine Uí Dhonnchadha
|
Monica Ward
|
Liang Xu
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
This paper describes Cipher – Faoi Gheasa, a ‘game with a purpose’ designed to support the learning of Irish in a fun and enjoyable way. The aim of the game is to promote language ‘noticing’ and to combine the benefits of reading with the enjoyment of computer game playing, in a pedagogically beneficial way. In this paper we discuss pedagogical challenges for Irish, the development of measures for the selection and ranking of reading materials, as well as initial results of game evaluation. Overall user feedback is positive and further testing and development is envisaged.
pdf
abs
Faoi Gheasa an adaptive game for Irish language learning
Liang Xu
|
Elaine Uí Dhonnchadha
|
Monica Ward
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
In this paper, we present a game with a purpose (GWAP) (Von Ahn 2006). The aim of the game is to promote language learning and ‘noticing’ (Skehan, 2013). The game has been designed for Irish, but the framework could be used for other languages. Irish is a minority language which means that L2 learners have limited opportunities for exposure to the language, and additionally, there are also limited (digital) learning resources available. This research incorporates game development, language pedagogy and ICALL language materials development. This paper will focus on the language materials development as this is a bottleneck in the teaching and learning of minority and endangered languages.
2012
pdf
abs
Irish Treebanking and Parsing: A Preliminary Evaluation
Teresa Lynn
|
Özlem Çetinoğlu
|
Jennifer Foster
|
Elaine Uí Dhonnchadha
|
Mark Dras
|
Josef van Genabith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Language resources are essential for linguistic research and the development of NLP applications. Low-density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish ― namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language-specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula, and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development.
pdf
Active Learning and the Irish Treebank
Teresa Lynn
|
Jennifer Foster
|
Mark Dras
|
Elaine Uí Dhonnchadha
Proceedings of the Australasian Language Technology Association Workshop 2012
2010
pdf
abs
Partial Dependency Parsing for Irish
Elaine Uí Dhonnchadha
|
Josef Van Genabith
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We present a partial dependency parser for Irish. Constraint Grammar (CG) based rules are used to annotate dependency relations and grammatical functions. Chunking is performed using a regular-expression grammar which operates on the dependency tagged sentences. As this is the first implementation of a parser for unrestricted Irish text (to our knowledge), there were no guidelines or precedents available. Therefore deciding what constitutes a syntactic unit, and how it should be annotated, accounts for a major part of the early development effort. Currently, all tokens in a sentence are tagged for grammatical function and local dependency. Long-distance dependencies, prepositional attachments or coordination are not handled, resulting in a partial dependency analysis. Evaluations show that the partial dependency analysis achieves an f-score of 93.60% on development data and 94.28% on unseen test data, while the chunker achieves an f-score of 97.20% on development data and 93.50% on unseen test data.
2006
pdf
abs
A Part-of-speech tagger for Irish using Finite-State Morphology and Constraint Grammar Disambiguation
E. Uí Dhonnchadha
|
J. Van Genabith
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper describes the methodology used to develop a part-of-speech tagger for Irish, which is used to annotate a corpus of 30 million words of text with part-of-speech tags and lemmas. The tagger is evaluated using a manually disambiguated test corpus and it currently achieves 95% accuracy on unrestricted text. To our knowledge, this is the first part-of-speech tagger for Irish.
2004
pdf
CL for CALL in the Primary School
Katrina Keogh
|
Thomas Koller
|
Monica Ward
|
Elaine Uí Dhonnchadha
|
Josef van Genabith
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning
2002
pdf
A Two-level Morphological Analyser and Generator for Irish using Finite-State Transducers
Elaine Uí Dhonnchadha
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)