Daniel Marciniak
2026
SLANG-GraphRAG: Multi-Layered Retrieval with Domain-Specific Knowledge for Low Resource Social Media Conversations
Ifeoluwa Wuraola | Daniel Marciniak | Nina Dethlefs
Findings of the Association for Computational Linguistics: EACL 2026
Ifeoluwa Wuraola | Daniel Marciniak | Nina Dethlefs
Findings of the Association for Computational Linguistics: EACL 2026
Emotion classification on social media is especially difficult when texts include informal, culturally grounded language like slang. Standard NLP benchmarks often miss these nuances, particularly in low-resource settings. We present SLANG-GraphRAG, a retrieval-augmented framework that integrates a culture-specific slang knowledge graph into large language models via one-shot prompting. Using multiple retrieval strategies, we incorporate slang definitions, regional usage, and conversational context. Our results show that incorporating structured cultural knowledge into the retrieval process leads to significant improvements, improving accuracy by up to 31% and F1 score by 28%, outperforming traditional and unstructured retrieval methods. To better evaluate model behavior, we propose a probabilistic metric that reflects the distribution of human annotations, providing a more nuanced measure of performance. This highlights the value of culturally sensitive applications and more balanced evaluation in subjective NLP tasks.
2024
Understanding Slang with LLMs: Modelling Cross-Cultural Nuances through Paraphrasing
Ifeoluwa Wuraola | Nina Dethlefs | Daniel Marciniak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ifeoluwa Wuraola | Nina Dethlefs | Daniel Marciniak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In the realm of social media discourse, the integration of slang enriches communication, reflecting the sociocultural identities of users. This study investigates the capability of large language models (LLMs) to paraphrase slang within climate-related tweets from Nigeria and the UK, with a focus on identifying emotional nuances. Using DistilRoBERTa as the base-line model, we observe its limited comprehension of slang. To improve cross-cultural understanding, we gauge the effectiveness of leading LLMs ChatGPT 4, Gemini, and LLaMA3 in slang paraphrasing. While ChatGPT 4 and Gemini demonstrate comparable effectiveness in slang paraphrasing, LLaMA3 shows less coverage, with all LLMs exhibiting limitations in coverage, especially of Nigerian slang. Our findings underscore the necessity for culturally sensitive LLM development in emotion classification, particularly in non-anglocentric regions.
2023
Linguistic Pattern Analysis in the Climate Change-Related Tweets from UK and Nigeria
Ifeoluwa Wuraola | Nina Dethlefs | Daniel Marciniak
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Ifeoluwa Wuraola | Nina Dethlefs | Daniel Marciniak
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
To understand the global trends of human opinion on climate change in specific geographical areas, this research proposes a framework to analyse linguistic features and cultural differences in climate-related tweets. Our study combines transformer networks with linguistic feature analysis to address small dataset limitations and gain insights into cultural differences in tweets from the UK and Nigeria. Our study found that Nigerians use more leadership language and informal words in discussing climate change on Twitter compared to the UK, as these topics are treated as an issue of salience and urgency. In contrast, the UK’s discourse about climate change on Twitter is characterised by using more formal, logical, and longer words per sentence compared to Nigeria. Also, we confirm the geographical identifiability of tweets through a classification task using DistilBERT, which achieves 83% of accuracy.