This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
JanBakker
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Cochrane produces systematic reviews whose abstracts are divided into seven standard sections. However, the plain language summaries (PLS) of Cochrane reviews do not adhere to the same structure, which has prevented researchers from training simplification models on paired abstract and PLS sections. In this work, we devise a two-step method to automatically divide PLS of Cochrane reviews into the same sections in which abstracts are divided. In the first step, we align each sentence in a PLS to a section in the parallel abstract if they cover similar content. In the second step, we classify the remaining sentences into sections based on the content of the PLS and what we learned from the first step. We manually divide 22 PLS into sections to evaluate our method. Upon execution of our method, we obtain the Cochrane-sections dataset, which consists of paired abstract and PLS sections in English for a total of 7.7K Cochrane reviews. Thus, our work yields references for the section-level simplification of biomedical abstracts.
Ces dernières années, l’action SimpleText a rassemblé une communauté active de chercheurs en traitement du langage naturel (TLN) et en recherche d’information (RI) autour d’un objectif commun : améliorer l’accessibilité des textes scientifiques. Ses références en matière de recherche d’extraits scientifiques, de détection et d’explication de terminologies scientifiques, ainsi que de simplification de textes scientifiques sont désormais des standards. En 2025, nous introduisons cette année des changements majeurs dans l’organisation et les missions de l’action. L’action CLEF 2025 SimpleText proposera trois tâches principales. . Tâche 1 sur Simplification de texte : simplification de texte scientifique. Tâche 2 sur Créativité contrôlée : identifier et éviter les hallucinations. Tâche 3 surSimpleText 2024 Revisité : tâches sélectionnées sur demande populaire.
Jargon identification is critical for improving the accessibility of biomedical texts yet models are often evaluated on isolated datasets leaving open questions about generalization. After reproducing MedReadMes jargon detection results and extending evaluation to the PLABA dataset we find that transfer learning across datasets yields only modest gains largely due to divergent annotation objectives. Through manual re-annotation we show that aligning labeling schemes improves cross-dataset performance. Building on these findings we evaluate several jargon-aware prompting strategies for LLM-based medical text simplification. Explicitly highlighting jargon in prompts does not consistently improve simplification quality. When gains occur they often trade off against readability and are model-dependent. Human evaluation indicates that simple prompting can be as effective as more complex jargon-aware instructions. We release code to facilitate further research https//anonymous.4open.science/r/tsar-anonymous-2D66F/README.md
Previous research on automatic text simplification has focused on almost exclusively on sentence-level inputs. However, the simplification of full documents cannot be tackled by naively simplifying each sentence in isolation, as this approach fails to preserve the discourse structure of the document. Recent Context-Aware Document Simplification approaches explore various models whose input goes beyond the sentence-level. These model achieve state-of-the-art performance on the Newsela-auto dataset, which requires a difficult to obtain license to use. We replicate these experiments on an open-source dataset, namely Wiki-auto, and share all training details to make future reproductions easy. Our results validate the claim that models guided by a document-level plan outperform their standard counterparts. However, they do not support the claim that simplification models perform better when they have access to a local document context. We also find that planning models do not generalize well to out-of-domain settings. Lay Summary: We have access to unprecedented amounts of information, yet the most authoritative sources may exceed a user’s language proficiency level. Text simplification technology can change the writing style while preserving the main content. Recent paragraph-level and document-level text simplification approaches outcompete traditional sentence-level approaches, and increase the understandability of complex texts.
The most reliable and up-to-date information on health questions is in the biomedical literature, but inaccessible due to the complex language full of jargon. Domain specific scientific text simplification holds the promise to make this literature accessible to a lay audience. Therefore, we create Cochrane-auto: a large corpus of pairs of aligned sentences, paragraphs, and abstracts from biomedical abstracts and lay summaries. Experiments demonstrate that a plan-guided simplification system trained on Cochrane-auto is able to outperform a strong baseline trained on unaligned abstracts and lay summaries. More generally, our freely available corpus complementing Newsela-auto and Wiki-auto facilitates text simplification research beyond the sentence-level and direct lexical and grammatical revisions.