This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
ElaineUí Dhonnchadha
Also published as:
E. Uí Dhonnchadha,
Elaine Uí Dhonnchadha
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Computer-Assisted Language Learning (CALL) applications have many benefits for language learning. However, they can be difficult to develop for low-resource languages such as Irish and the other Celtic languages. It can be difficult to assemble the multidisciplinary team needed to develop CALL resources and there are fewer language resources available for the language. This paper provides an overview of a pragmatic approach to using Artificial Intelligence (AI) and Virtual Reality (VR) in developing a Digital Game-Based Language Learning (DGBLL) app for Irish. This pragmatic approach was used to develop Cipher - a DGBLL app for Irish (Xu et al, 2022b) where a number of existing resources including text repositories and NLP tools were used. In this paper the focus is on the incorporation of Artificial Intelligence (AI) technologies including AI image generation, text-to-speech (TTS) and Virtual Reality (VR), in a pedagogically informed manner to support language learning in a way that is both challenging and enjoyable. Cipher has been designed to be language independent and can be adapted for various cohorts of learners and for other languages. Cipher has been played and tested in a number of schools in Dublin and the feedback from teachers and students has been very positive. This paper outlines how AI and VR technologies have been utilised in Cipher and how it could be adapted to other Celtic languages and low-resource languages in general.
This study explores Cipher, an adaptive language learning game tailored for the under-resourced Irish language, aimed mainly at primary school students. By integrating text analysis techniques, Cipher dynamically adjusts its difficulty based on the player’s language proficiency, offering a customised learning experience. The game’s narrative involves decoding spells to access Irish myths and stories, combining language learning with cultural elements. Development involved collaboration with educators to align the game content with curriculum standards and incorporate culturally relevant materials. This paper outlines the game’s development process, emphasising the use of text analysis for difficulty adjustment and the importance of engaging, educational gameplay. Preliminary results indicate that adaptive games like Cipher can enhance language learning by providing immersive, personalised experiences that maintain player motivation and engagement.
Well annotated corpora have been shown to have great value, both in linguistic and non-linguistic research, and in supporting machine-learning and many other non-research activities including language teaching. For minority languages, annotated corpora can help in understanding language usage norms among native and non-native speakers, providing valuable information both for lexicography and for teaching, and helping to combat the decline of speaker numbers. At the same time, minority languages suffer from having fewer available language resources than majority languages, and far less-developed annotation tooling. To date there is very little work in semantic annotation for Irish. In this paper we report on progress to date in the building of a standard tool-set for semantic annotation of Irish, including a novel method for evaluation of semantic annotation. A small corpus of Irish language data has been manually annotated with semantic tags, and manually checked. A semantic type tagging framework has then been developed using existing technologies, and using a semantic lexicon that has been built from a variety of sources. Semantic disambiguation methods have been added with a view to increasing accuracy. That framework has then been tested using the manually tagged corpus, resulting in over 90% lexical coverage and almost 80% tag accuracy. Development is ongoing as part of a larger corpus development project, and plans include expansion of the manually tagged corpus, expansion of the lexicon, and exploration of further disambiguation methods. As the first semantic tagger for Irish, to our knowledge, it is hoped that this research will form a sound basis for semantic annotation of Irish corpora in to the future.
Digital game-based language learning (DGBLL) can help with the language learning process. DGBLL applications can make learning more enjoyable and engaging, but they are difficult to develop. A DBGLL app that relies on target language texts obviously needs to be able to use texts of the appropriate level for the individual learners. This implies that text classification tools should be available to DGBLL developers, who may not be familiar with the target language, in order to incorporate suitable texts into their games. While text difficulty classifiers exist for many of the most commonly spoken languages, this is not the case for under-resourced languages, such as Irish. In this paper, we explore approaches to the development of text classifiers for Irish. In the first approach to text analysis and grading, we apply linguistic analysis to assess text complexity. Features from this approach are then used in machine learning-based text classification, which explores the application of a number of machine learning algorithms to the problem. Although the development of these text classifiers is at an early stage, they show promise, particularly in a low-resourced scenario.
In this paper, we describe the submission of Dublin City University (DCU) and Trinity College Dublin (TCD) for the WebNLG 2023 shared task. We present a fully rule-based pipeline for generating Irish texts from DBpedia triple sets which comprises 4 components: triple lexicalisation, generation of noninflected Irish text, inflection generation, and post-processing.
In this paper, we describe M-FleNS, a multilingual flexible plug-and-play architecture designed to accommodate neural and symbolic modules, and initially instantiated with rule-based modules. We focus on using M-FleNS for the specific purpose of building new resources for Irish, a language currently under-represented in the NLP landscape. We present the general M-FleNS framework and how we use it to build an Irish Natural Language Generation system for verbalising part of the DBpedia ontology and building a multilayered dataset with rich linguistic annotations. Via automatic and human assessments of the output texts we show that with very limited resources we are able to create a system that reaches high levels of fluency and semantic accuracy, while having very low energy and memory requirements.
This paper describes Cipher – Faoi Gheasa, a ‘game with a purpose’ designed to support the learning of Irish in a fun and enjoyable way. The aim of the game is to promote language ‘noticing’ and to combine the benefits of reading with the enjoyment of computer game playing, in a pedagogically beneficial way. In this paper we discuss pedagogical challenges for Irish, the development of measures for the selection and ranking of reading materials, as well as initial results of game evaluation. Overall user feedback is positive and further testing and development is envisaged.
In this paper, we present a game with a purpose (GWAP) (Von Ahn 2006). The aim of the game is to promote language learning and ‘noticing’ (Skehan, 2013). The game has been designed for Irish, but the framework could be used for other languages. Irish is a minority language which means that L2 learners have limited opportunities for exposure to the language, and additionally, there are also limited (digital) learning resources available. This research incorporates game development, language pedagogy and ICALL language materials development. This paper will focus on the language materials development as this is a bottleneck in the teaching and learning of minority and endangered languages.
This paper provides an overview of the Cipher engine which enables the development of a Digital Educational Game (DEG) based on noticing ciphers or patterns in texts. The Cipher engine was used to develop the Cipher: Faoi Gheasa, a digital educational game for Irish, which incorporates NLP resources and is informed by Digital Game-Based Language Learning (DGBLL) and Computer-Assisted Language Learning (CALL) research. The paper outlines six phases where NLP has strengthened the Cipher: Faoi Gheasa game. It shows how the Cipher engine can be used to build a Cipher game for other languages, particularly low-resourced and endangered languages in which NLP resources are under-developed or few in number.
Language resources are essential for linguistic research and the development of NLP applications. Low-density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish ― namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language-specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula, and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development.
We present a partial dependency parser for Irish. Constraint Grammar (CG) based rules are used to annotate dependency relations and grammatical functions. Chunking is performed using a regular-expression grammar which operates on the dependency tagged sentences. As this is the first implementation of a parser for unrestricted Irish text (to our knowledge), there were no guidelines or precedents available. Therefore deciding what constitutes a syntactic unit, and how it should be annotated, accounts for a major part of the early development effort. Currently, all tokens in a sentence are tagged for grammatical function and local dependency. Long-distance dependencies, prepositional attachments or coordination are not handled, resulting in a partial dependency analysis. Evaluations show that the partial dependency analysis achieves an f-score of 93.60% on development data and 94.28% on unseen test data, while the chunker achieves an f-score of 97.20% on development data and 93.50% on unseen test data.
This paper describes the methodology used to develop a part-of-speech tagger for Irish, which is used to annotate a corpus of 30 million words of text with part-of-speech tags and lemmas. The tagger is evaluated using a manually disambiguated test corpus and it currently achieves 95% accuracy on unrestricted text. To our knowledge, this is the first part-of-speech tagger for Irish.