Daisy Monika Lal


2024

pdf
The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment
Chris Chinenye Emezue | Ifeoma Okoh | Chinedu Emmanuel Mbonu | Chiamaka Chukwuneke | Daisy Monika Lal | Ignatius Ezeani | Paul Rayson | Ijemma Onwuzulike | Chukwuma Onyebuchi Okeke | Gerald Okey Nweya | Bright Ikechukwu Ogbonna | Chukwuebuka Uchenna Oraegbunam | Esther Chidinma Awo-Ndubuisi | Akudo Amarachukwu Osuagwu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study. This highlights the need to develop language technologies for Igbo to foster communication, learning and preservation. To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language. The primary obstacle in achieving dialectal-aware language technologies is the lack of comprehensive dialectal datasets. In response, we present the IgboAPI dataset, a multi-dialectal Igbo-English dictionary dataset, developed with the aim of enhancing the representation of Igbo dialects. Furthermore, we illustrate the practicality of the IgboAPI dataset through two distinct studies: one focusing on Igbo semantic lexicon and the other on machine translation. In the semantic lexicon project, we successfully establish an initial Igbo semantic lexicon for the Igbo semantic tagger, while in the machine translation study, we demonstrate that by finetuning existing machine translation systems using the IgboAPI dataset, we significantly improve their ability to handle dialectal variations in sentences.

pdf
Analysing Emotions in Cancer Narratives: A Corpus-Driven Approach
Daisy Monika Lal | Paul Rayson | Sheila A. Payne | Yufeng Liu
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

Cancer not only affects a patient’s physical health, but it can also elicit a wide spectrum of intense emotions in patients, friends, and family members. People with cancer and their carers (family member, partner, or friend) are increasingly turning to the web for information and support. Despite the expansion of sentiment analysis in the context of social media and healthcare, there is relatively less research on patient narratives, which are longer, more complex texts, and difficult to assess. In this exploratory work, we examine how patients and carers express their feelings about various aspects of cancer (treatments and stages). The objective of this paper is to illustrate with examples the nature of language in the clinical domain, as well as the complexities of language when performing automatic sentiment and emotion analysis. We perform a linguistic analysis of a corpus of cancer narratives collected from Reddit. We examine the performance of five state-of-the-art models (T5, DistilBERT, Roberta, RobertaGo, and NRCLex) to see how well they match with human comparisons separated by linguistic and medical background. The corpus yielded several surprising results that could be useful to sentiment analysis NLP experts. The linguistic issues encountered were classified into four categories: statements expressing a variety of emotions, ambiguous or conflicting statements with contradictory emotions, statements requiring additional context, and statements in which sentiment and emotions can be inferred but are not explicitly mentioned.

pdf
Medical-FLAVORS: A Figurative Language and Vocabulary Open Repository for Spanish in the Medical Domain
Lucia Pitarch | Emma Angles-Herrero | Yufeng Liu | Daisy Monika Lal | Jorge Gracia | Paul Rayson | Judith Rietjens
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

Metaphors shape the way we think by enabling the expression of one concept in terms of another one. For instance, cancer can be understood as a place from which one can go in and out, as a journey that one can traverse, or as a battle. Giving patients awareness of the way they refer to cancer and different narratives in which they can reframe it has been proven to be a key aspect when experiencing the disease. In this work, we propose a preliminary identification and representation of Spanish cancer metaphors using MIP (Metaphor Identification Procedure) and MetaNet. The created resource is the first openly available dataset for medical metaphors in Spanish. Thus, in the future, we expect to use it as the gold standard in automatic metaphor processing tasks, which will also serve to further populate the resource and understand how cancer is experienced and narrated.