This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
This paper describes our approach to address the SemEval-2025 Task 10 subtask 3, which is focused on narrative extraction given news articles with a dominant narrative. We design an external knowledge injection approach to fine-tune a Flan-T5 model so the generated narrative explanations are in line with the dominant narrative determined in each text. We also incorporate pragmatic information in the form of communicative intentions, using them as external knowledge to assist the model. This ensures that the generated texts align more closely with the intended explanations and effectively convey the expected meaning. The results show that our approach ranks 3rd in the task leaderboard (0.7428 in Macro-F1) with concise and effective news explanations. The analyses highlight the importance of adding pragmatic information when training systems to generate adequate narrative extractions.
In the age of knowledge, the democratisation of information facilitated through the Internet may not be as pervasive if written language poses challenges to particular sectors of the population. The objective of this paper is to present an overview of research-based automatic text simplification tools. Consequently, we describe aspects such as the language, language phenomena, language levels simplified, approaches, specific target populations these tools are created for (e.g. individuals with cognitive impairment, attention deficit, elderly people, children, language learners), and accessibility and availability considerations. The review of existing studies covering automatic text simplification tools is undergone by searching two databases: Web of Science and Scopus. The eligibility criteria involve text simplification tools with a scientific background in order to ascertain how they operate. This methodology yielded 27 text simplification tools that are further analysed. Some of the main conclusions reached with this review are the lack of resources accessible to the public, the need for customisation to foster the individual’s independence by allowing the user to select what s/he finds challenging to understand while not limiting the user’s capabilities and the need for more simplification tools in languages other than English, to mention a few.
This paper presents the ongoing work conducted within the ClearText project, specifically focusing on the resource creation for the simplification of Spanish for people with cognitive disabilities. These resources include the CLEARSIM corpus and the Simple.Text tool. On the one hand, a description of the corpus compilation process with the help of APSA is detailed along with information regarding whether these texts are bronze, silver or gold standard simplification versions from the original text. The goal to reach is 18,000 texts in total by the end of the project. On the other hand, we aim to explore Large Language Models (LLMs) in a sequence-to-sequence setup for text simplification at the document level. Therefore, the tool’s objectives, technical aspects, and the preliminary results derived from early experimentation are also presented. The initial results are subject to improvement, given that experimentation is in a very preliminary stage. Despite showcasing flaws inherent to generative models (e.g. hallucinations, repetitive text), we examine the resolutions (or lack thereof) of complex linguistic phenomena that can be learned from the corpus. These issues will be addressed throughout the remainder of this project. The expected positive results from this project that will impact society are three-fold in nature: scientific-technical, social, and economic.
This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with the state of the art without employing external domain-specific resources. And, (ii) English CONLL2003 dataset (Overall F1 = 81.40), although our results are lower than previous work, these are reached without external knowledge or complex linguistic analysis. Last, using the same configuration for the two corpora, the difference of overall F1 is only 6.48 points (DrugSemantics = 74.92 versus CoNLL2003 = 81.40). Thus, this result supports our hypothesis that our approach is language and domain independent and does not require any external knowledge or complex linguistic analysis.