Evelina Rennes


2021

pdf bib
Synonym Replacement based on a Study of Basic-level Nouns in Swedish Texts of Different Complexity
Evelina Rennes | Arne Jönsson
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Basic-level terms have been described as the most important to human categorisation. They are the earliest emerging words in children’s language acquisition, and seem to be more frequently occurring in language in general. In this article, we explored the use of basic-level nouns in texts of different complexity, and hypothesise that hypernyms with characteristics of basic-level words could be useful for the task of lexical simplification. We conducted two corpus studies using four different corpora, two corpora of standard Swedish and two corpora of simple Swedish, and explored whether corpora of simple texts contain a higher proportion of basic-level nouns than corpora of standard Swedish. Based on insights from the corpus studies, we developed a novel algorithm for choosing the best synonym by rewarding high relative frequencies and monolexemity, and restricting the climb in the word hierarchy not to suggest synonyms of a too high level of inclusiveness.

2020

pdf bib
Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences
Evelina Rennes
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Parallel monolingual resources are imperative for data-driven sentence simplification research. We present the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish. We compare the performance of three alignment algorithms used for similar work in English (Average Alignment, Maximum Alignment, and Hungarian Alignment), and the best-performing algorithm is used to create a resource of 15,433 unique sentence pairs. We evaluate the resulting corpus using a set of features that has proven to predict text complexity of Swedish texts. The results show that the sentences of the simple sub-corpus are indeed less complex than the sentences of the standard part of the corpus, according to many of the text complexity measures.

pdf bib
Visualizing Facets of Text Complexity across Registers
Marina Santini | Arne Jonsson | Evelina Rennes
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

In this paper, we propose visualizing results of a corpus-based study on text complexity using radar charts. We argue that the added value of this type of visualisation is the polygonal shape that provides an intuitive grasp of text complexity similarities across the registers of a corpus. The results that we visualize come from a study where we explored whether it is possible to automatically single out different facets of text complexity across the registers of a Swedish corpus. To this end, we used factor analysis as applied in Biber’s Multi-Dimensional Analysis framework. The visualization of text complexity facets with radar charts indicates that there is correspondence between linguistic similarity and similarity of shape across registers.

2018

bib
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)
Arne Jönsson | Evelina Rennes | Horacio Saggion | Sanja Stajner | Victoria Yaneva
Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA)

2017

pdf bib
Services for text simplification and analysis
Johan Falkenjack | Evelina Rennes | Daniel Fahlborg | Vida Johansson | Arne Jönsson
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
Similarity-Based Alignment of Monolingual Corpora for Text Simplification Purposes
Sarah Albertsson | Evelina Rennes | Arne Jönsson
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Comparable or parallel corpora are beneficial for many NLP tasks. The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus. The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was developed, and the methods were compared. For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.

2015

pdf bib
A Tool for Automatic Simplification of Swedish Texts
Evelina Rennes | Arne Jönsson
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib
The Impact of Cohesion Errors in Extraction Based Summaries
Evelina Rennes | Arne Jönsson
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present results from an eye tracking study of automatic text summarization. Automatic text summarization is a growing field due to the modern world’s Internet based society, but to automatically create perfect summaries is challenging. One problem is that extraction based summaries often have cohesion errors. By the usage of an eye tracking camera, we have studied the nature of four different types of cohesion errors occurring in extraction based summaries. A total of 23 participants read and rated four different texts and marked the most difficult areas of each text. Statistical analysis of the data revealed that absent cohesion or context and broken anaphoric reference (pronouns) caused some disturbance in reading, but that the impact is restricted to the effort to read rather than the comprehension of the text. However, erroneous anaphoric references (pronouns) were not always detected by the participants which poses a problem for automatic text summarizers. The study also revealed other potential disturbing factors.