2023
pdf
abs
Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification
Daniel Holmer
|
Evelina Rennes
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Automatic text simplification (ATS) describes the automatic transformation of a text from a complex form to a less complex form. Many modern ATS techniques need large parallel corpora of standard and simplified text, but such data does not exist for many languages. One way to overcome this issue is to create pseudo-parallel corpora by dividing existing corpora into standard and simple parts. In this work, we explore the creation of Swedish pseudo-parallel monolingual corpora by the application of different feature representation methods, sentence alignment algorithms, and indexing approaches, on a large monolingual corpus. The different corpora are used to fine-tune a sentence simplification system based on BART, which is evaluated with standard evaluation metrics for automatic text simplification.
pdf
bib
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning
David Alfter
|
Elena Volodina
|
Thomas François
|
Arne Jönsson
|
Evelina Rennes
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning
2022
pdf
abs
The Swedish Simplification Toolkit: – Designed with Target Audiences in Mind
Evelina Rennes
|
Marina Santini
|
Arne Jonsson
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference
In this paper, we present the current version of The Swedish Simplification Toolkit. The toolkit includes computational and empirical tools that have been developed along the years to explore a still neglected area of NLP, namely the simplification of “standard” texts to meet the needs of target audiences. Target audiences, such as people affected by dyslexia, aphasia, autism, but also children and second language learners, require different types of text simplification and adaptation. For example, while individual with aphasia have difficulties in reading compounds (such as arbetsmarknadsdepartement, eng. ministry of employment), second language learners struggle with cultural-specific vocabulary (e.g. konflikträdd, eng. afraid of conflicts). The toolkit allows user to selectively decide the types of simplification that meet the specific needs of the target audience they belong to. The Swedish Simplification Toolkit is one of the first attempts to overcome the one-fits-all approach that is still dominant in Automatic Text Simplification, and proposes a set of computational methods that, used individually or in combination, may help individuals reduce reading (and writing) difficulties.
pdf
abs
Perceived Text Quality and Readability in Extractive and Abstractive Summaries
Julius Monsen
|
Evelina Rennes
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We present results from a study investigating how users perceive text quality and readability in extractive and abstractive summaries. We trained two summarisation models on Swedish news data and used these to produce summaries of articles. With the produced summaries, we conducted an online survey in which the extractive summaries were compared to the abstractive summaries in terms of fluency, adequacy and simplicity. We found statistically significant differences in perceived fluency and adequacy between abstractive and extractive summaries but no statistically significant difference in simplicity. Extractive summaries were preferred in most cases, possibly due to the types of errors the summaries tend to have.
pdf
abs
NyLLex: A Novel Resource of Swedish Words Annotated with Reading Proficiency Level
Daniel Holmer
|
Evelina Rennes
Proceedings of the Thirteenth Language Resources and Evaluation Conference
What makes a text easy to read or not, depends on a variety of factors. One of the most prominent is, however, if the text contains easy, and avoids difficult, words. Deciding if a word is easy or difficult is not a trivial task, since it depends on characteristics of the word in itself as well as the reader, but it can be facilitated by the help of a corpus annotated with word frequencies and reading proficiency levels. In this paper, we present NyLLex, a novel lexical resource derived from books published by Sweden’s largest publisher for easy language texts. NyLLex consists of 6,668 entries, with frequency counts distributed over six reading proficiency levels. We show that NyLLex, with its novel source material aimed at individuals of different reading proficiency levels, can serve as a complement to already existing resources for Swedish.
pdf
bib
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning
David Alfter
|
Elena Volodina
|
Thomas François
|
Piet Desmet
|
Frederik Cornillie
|
Arne Jönsson
|
Evelina Rennes
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning
2021
pdf
abs
Synonym Replacement based on a Study of Basic-level Nouns in Swedish Texts of Different Complexity
Evelina Rennes
|
Arne Jönsson
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Basic-level terms have been described as the most important to human categorisation. They are the earliest emerging words in children’s language acquisition, and seem to be more frequently occurring in language in general. In this article, we explored the use of basic-level nouns in texts of different complexity, and hypothesise that hypernyms with characteristics of basic-level words could be useful for the task of lexical simplification. We conducted two corpus studies using four different corpora, two corpora of standard Swedish and two corpora of simple Swedish, and explored whether corpora of simple texts contain a higher proportion of basic-level nouns than corpora of standard Swedish. Based on insights from the corpus studies, we developed a novel algorithm for choosing the best synonym by rewarding high relative frequencies and monolexemity, and restricting the climb in the word hierarchy not to suggest synonyms of a too high level of inclusiveness.
2020
pdf
bib
abs
Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences
Evelina Rennes
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)
Parallel monolingual resources are imperative for data-driven sentence simplification research. We present the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish. We compare the performance of three alignment algorithms used for similar work in English (Average Alignment, Maximum Alignment, and Hungarian Alignment), and the best-performing algorithm is used to create a resource of 15,433 unique sentence pairs. We evaluate the resulting corpus using a set of features that has proven to predict text complexity of Swedish texts. The results show that the sentences of the simple sub-corpus are indeed less complex than the sentences of the standard part of the corpus, according to many of the text complexity measures.
pdf
abs
Visualizing Facets of Text Complexity across Registers
Marina Santini
|
Arne Jonsson
|
Evelina Rennes
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)
In this paper, we propose visualizing results of a corpus-based study on text complexity using radar charts. We argue that the added value of this type of visualisation is the polygonal shape that provides an intuitive grasp of text complexity similarities across the registers of a corpus. The results that we visualize come from a study where we explored whether it is possible to automatically single out different facets of text complexity across the registers of a Swedish corpus. To this end, we used factor analysis as applied in Biber’s Multi-Dimensional Analysis framework. The visualization of text complexity facets with radar charts indicates that there is correspondence between linguistic similarity and similarity of shape across registers.
2018
bib
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)
Arne Jönsson
|
Evelina Rennes
|
Horacio Saggion
|
Sanja Stajner
|
Victoria Yaneva
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)
2017
pdf
Services for text simplification and analysis
Johan Falkenjack
|
Evelina Rennes
|
Daniel Fahlborg
|
Vida Johansson
|
Arne Jönsson
Proceedings of the 21st Nordic Conference on Computational Linguistics
2016
pdf
abs
Similarity-Based Alignment of Monolingual Corpora for Text Simplification Purposes
Sarah Albertsson
|
Evelina Rennes
|
Arne Jönsson
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Comparable or parallel corpora are beneficial for many NLP tasks. The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus. The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was developed, and the methods were compared. For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.
2015
pdf
A Tool for Automatic Simplification of Swedish Texts
Evelina Rennes
|
Arne Jönsson
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)
2014
pdf
abs
The Impact of Cohesion Errors in Extraction Based Summaries
Evelina Rennes
|
Arne Jönsson
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present results from an eye tracking study of automatic text summarization. Automatic text summarization is a growing field due to the modern world’s Internet based society, but to automatically create perfect summaries is challenging. One problem is that extraction based summaries often have cohesion errors. By the usage of an eye tracking camera, we have studied the nature of four different types of cohesion errors occurring in extraction based summaries. A total of 23 participants read and rated four different texts and marked the most difficult areas of each text. Statistical analysis of the data revealed that absent cohesion or context and broken anaphoric reference (pronouns) caused some disturbance in reading, but that the impact is restricted to the effort to read rather than the comprehension of the text. However, erroneous anaphoric references (pronouns) were not always detected by the participants which poses a problem for automatic text summarizers. The study also revealed other potential disturbing factors.