pdf
bib
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Sanja Štajner
|
Horacio Saggion
|
Daniel Ferrés
|
Matthew Shardlow
|
Kim Cheng Sheang
|
Kai North
|
Marcos Zampieri
|
Wei Xu
pdf
bib
abs
The Fewer Splits are Better: Deconstructing Readability in Sentence Splitting
Tadashi Nomoto
In this work, we focus on sentence splitting, a subfield of text simplification, primarily motivated by an unproven idea that if you divide a sentence into pieces, it should become easier to understand. Our primary goal in this paper is to determine whether this is true. In particular, we ask, does it matter whether we break a sentence into two or three? We report on our findings based on Amazon Mechanical Turk. More specifically, we introduce a Bayesian modeling framework to further investigate to what degree a particular way of splitting the complex sentence affects readability, along with a number of other parameters adopted from diverse perspectives, including clinical linguistics, and cognitive linguistics. The Bayesian modeling experiment provides clear evidence that bisecting the sentence leads to enhanced readability to a degree greater than when we create simplification by trisection.
pdf
bib
abs
Parallel Corpus Filtering for Japanese Text Simplification
Koki Hatagaki
|
Tomoyuki Kajiwara
|
Takashi Ninomiya
We propose a method of parallel corpus filtering for Japanese text simplification. The parallel corpus for this task contains some redundant wording. In this study, we first identify the type and size of noisy sentence pairs in the Japanese text simplification corpus. We then propose a method of parallel corpus filtering to remove each type of noisy sentence pair. Experimental results show that filtering the training parallel corpus with the proposed method improves simplification performance.
pdf
abs
Patient-friendly Clinical Notes: Towards a new Text Simplification Dataset
Jan Trienes
|
Jörg Schlötterer
|
Hans-Ulrich Schildhaus
|
Christin Seifert
Automatic text simplification can help patients to better understand their own clinical notes. A major hurdle for the development of clinical text simplification methods is the lack of high quality resources. We report ongoing efforts in creating a parallel dataset of professionally simplified clinical notes. Currently, this corpus consists of 851 document-level simplifications of German pathology reports. We highlight characteristics of this dataset and establish first baselines for paragraph-level simplification.
pdf
abs
Target-Level Sentence Simplification as Controlled Paraphrasing
Tannon Kew
|
Sarah Ebling
Automatic text simplification aims to reduce the linguistic complexity of a text in order to make it easier to understand and more accessible. However, simplified texts are consumed by a diverse array of target audiences and what might be appropriately simplified for one group of readers may differ considerably for another. In this work we investigate a novel formulation of sentence simplification as paraphrasing with controlled decoding. This approach aims to alleviate the major burden of relying on large amounts of in-domain parallel training data, while at the same time allowing for modular and adaptive simplification. According to automatic metrics, our approach performs competitively against baselines that prove more difficult to adapt to the needs of different target audiences or require significant amounts of complex-simple parallel aligned data.
pdf
abs
Conciseness: An Overlooked Language Task
Felix Stahlberg
|
Aashish Kumar
|
Chris Alberti
|
Shankar Kumar
We report on novel investigations into training models that make sentences concise. We define the task and show that it is different from related tasks such as summarization and simplification. For evaluation, we release two test sets, consisting of 2000 sentences each, that were annotated by two and five human annotators, respectively. We demonstrate that conciseness is a difficult task for which zero-shot setups with large neural language models often do not perform well. Given the limitations of these approaches, we propose a synthetic data generation method based on round-trip translations. Using this data to either train Transformers from scratch or fine-tune T5 models yields our strongest baselines that can be further improved by fine-tuning on an artificial conciseness dataset that we derived from multi-annotator machine translation test sets.
pdf
abs
Revision for Concision: A Constrained Paraphrase Generation Task
Wenchuan Mu
|
Kwan Hui Lim
Academic writing should be concise as concise sentences better keep the readers’ attention and convey meaning clearly. Writing concisely is challenging, for writers often struggle to revise their drafts. We introduce and formulate revising for concision as a natural language processing task at the sentence level. Revising for concision requires algorithms to use only necessary words to rewrite a sentence while preserving its meaning. The revised sentence should be evaluated according to its word choice, sentence structure, and organization. The revised sentence also needs to fulfil semantic retention and syntactic soundness. To aide these efforts, we curate and make available a benchmark parallel dataset that can depict revising for concision. The dataset contains 536 pairs of sentences before and after revising, and all pairs are collected from college writing centres. We also present and evaluate the approaches to this problem, which may assist researchers in this area.
pdf
abs
Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels
Alberto Poncelas
|
Ohnmar Htun
In Neural Machine Translation (NMT) systems, there is generally little control over the lexicon of the output. Consequently, the translated output may be too difficult for certain audiences. For example, for people with limited knowledge of the language, vocabulary is a major impediment to understanding a text. In this work, we build a complexity-controllable NMT for English-to-Japanese translations. More particularly, we aim to modulate the difficulty of the translation in terms of not only the vocabulary but also the use of kanji. For achieving this, we follow a sentence-tagging approach to influence the output. Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels.
pdf
abs
IrekiaLFes: a New Open Benchmark and Baseline Systems for Spanish Automatic Text Simplification
Itziar Gonzalez-Dios
|
Iker Gutiérrez-Fandiño
|
Oscar m. Cumbicus-Pineda
|
Aitor Soroa
Automatic Text simplification (ATS) seeks to reduce the complexity of a text for a general public or a target audience. In the last years, deep learning methods have become the most used systems in ATS research, but these systems need large and good quality datasets to be evaluated. Moreover, these data are available on a large scale only for English and in some cases with restrictive licenses. In this paper, we present IrekiaLF_es, an open-license benchmark for Spanish text simplification. It consists of a document-level corpus and a sentence-level test set that has been manually aligned. We also conduct a neurolinguistically-based evaluation of the corpus in order to reveal its suitability for text simplification. This evaluation follows the Lexicon-Unification-Linearity (LeULi) model of neurolinguistic complexity assessment. Finally, we present a set of experiments and baselines of ATS systems in a zero-shot scenario.
pdf
abs
Lexical Simplification in Foreign Language Learning: Creating Pedagogically Suitable Simplified Example Sentences
Jasper Degraeuwe
|
Horacio Saggion
This study presents a lexical simplification (LS) methodology for foreign language (FL) learning purposes, a barely explored area of automatic text simplification (TS). The method, targeted at Spanish as a foreign language (SFL), includes a customised complex word identification (CWI) classifier and generates substitutions based on masked language modelling. Performance is calculated on a custom dataset by means of a new, pedagogically-oriented evaluation. With 43% of the top simplifications being found suitable, the method shows potential for simplifying sentences to be used in FL learning activities. The evaluation also suggests that, though still crucial, meaning preservation is not always a prerequisite for successful LS. To arrive at grammatically correct and more idiomatic simplifications, future research could study the integration of association measures based on co-occurrence data.
pdf
abs
Eye-tracking based classification of Mandarin Chinese readers with and without dyslexia using neural sequence models
Patrick Haller
|
Andreas Säuberli
|
Sarah Kiener
|
Jinger Pan
|
Ming Yan
|
Lena Jäger
Eye movements are known to reflect cognitive processes in reading, and psychological reading research has shown that eye gaze patterns differ between readers with and without dyslexia. In recent years, researchers have attempted to classify readers with dyslexia based on their eye movements using Support Vector Machines (SVMs). However, these approaches (i) are based on highly aggregated features averaged over all words read by a participant, thus disregarding the sequential nature of the eye movements, and (ii) do not consider the linguistic stimulus and its interaction with the reader’s eye movements. In the present work, we propose two simple sequence models that process eye movements on the entire stimulus without the need of aggregating features across the sentence. Additionally, we incorporate the linguistic stimulus into the model in two ways—contextualized word embeddings and manually extracted linguistic features. The models are evaluated on a Mandarin Chinese dataset containing eye movements from children with and without dyslexia. Our results show that (i) even for a logographic script such as Chinese, sequence models are able to classify dyslexia on eye gaze sequences, reaching state-of-the-art performance, and (ii) incorporating the linguistic stimulus does not help to improve classification performance.
pdf
abs
A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification
Oliver Alonzo
|
Sooyeon Lee
|
Mounica Maddela
|
Wei Xu
|
Matt Huenerfauth
Research has explored the use of automatic text simplification (ATS), which consists of techniques to make text simpler to read, to provide reading assistance to Deaf and Hard-of-hearing (DHH) adults with various literacy levels. Prior work in this area has identified interest in and benefits from ATS-based reading assistance tools. However, no prior work on ATS has gathered judgements from DHH adults as to what constitutes complex text. Thus, following approaches in prior NLP work, this paper contributes new word-complexity judgements from 11 DHH adults on a dataset of 15,000 English words that had been previously annotated by L2 speakers, which we also augmented to include automatic annotations of linguistic characteristics of the words. Additionally, we conduct a supplementary analysis of the interaction effect between the linguistic characteristics of the words and the groups of annotators. This analysis highlights the importance of collecting judgements from DHH adults for training ATS systems, as it revealed statistically significant interaction effects for nearly all of the linguistic characteristics of the words.
pdf
abs
(Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification
Yu Qiao
|
Xiaofei Li
|
Daniel Wiechmann
|
Elma Kerz
State-of-the-art text simplification (TS) systems adopt end-to-end neural network models to directly generate the simplified version of the input text, and usually function as a blackbox. Moreover, TS is usually treated as an all-purpose generic task under the assumption of homogeneity, where the same simplification is suitable for all. In recent years, however, there has been increasing recognition of the need to adapt the simplification techniques to the specific needs of different target groups. In this work, we aim to advance current research on explainable and controllable TS in two ways: First, building on recently proposed work to increase the transparency of TS systems (Garbacea et al., 2020), we use a large set of (psycho-)linguistic features in combination with pre-trained language models to improve explainable complexity prediction. Second, based on the results of this preliminary task, we extend a state-of-the-art Seq2Seq TS model, ACCESS (Martin et al., 2020), to enable explicit control of ten attributes. The results of experiments show (1) that our approach improves the performance of state-of-the-art models for predicting explainable complexity and (2) that explicitly conditioning the Seq2Seq model on ten attributes leads to a significant improvement in performance in both within-domain and out-of-domain settings.
pdf
abs
Lexically Constrained Decoding with Edit Operation Prediction for Controllable Text Simplification
Tatsuya Zetsu
|
Tomoyuki Kajiwara
|
Yuki Arase
Controllable text simplification assists language learners by automatically rewriting complex sentences into simpler forms of a target level. However, existing methods tend to perform conservative edits that keep complex words intact. To address this problem, we employ lexically constrained decoding to encourage rewriting. Specifically, the proposed method predicts edit operations conditioned to a target level and creates positive/negative constraints for words that should/should not appear in an output sentence. The experimental results confirm that our method significantly outperforms previous methods and demonstrates a new state-of-the-art performance.
pdf
abs
An Investigation into the Effect of Control Tokens on Text Simplification
Zihao Li
|
Matthew Shardlow
|
Saeed Hassan
Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.
pdf
abs
Divide-and-Conquer Text Simplification by Scalable Data Enhancement
Sanqiang Zhao
|
Rui Meng
|
Hui Su
|
Daqing He
Text simplification is a task to reduce the complexity of a text while retain its original meaning. It can facilitate people with low-literacy skills or language impairments, such as children and individuals with dyslexia and aphasia, to read and understand complicated materials. Normally, substitution, deletion, reordering, and splitting are considered as four core operations for performing text simplification. Thus an ideal model should be capable of executing these operations appropriately to simplify a text. However, by examining the degree that each operation is exerted in different datasets, we observe that there is a salient discrepancy between the human annotation and existing training data that is widely used for training simplification models. To alleviate this discrepancy, we propose an unsupervised data construction method that distills each simplifying operation into data via different automatic data enhancement measures. The empirical results demonstrate that the resulting dataset SimSim can support models to achieve better performance by performing all operations properly.
pdf
abs
Improving Text Simplification with Factuality Error Detection
Yuan Ma
|
Sandaru Seneviratne
|
Elena Daskalaki
In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.
pdf
abs
JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers
Akio Hayakawa
|
Tomoyuki Kajiwara
|
Hiroki Ouchi
|
Taro Watanabe
The user-dependency of Text Simplification makes its evaluation obscure. A targeted evaluation dataset clarifies the purpose of simplification, though its specification is hard to define. We built JADES (JApanese Dataset for the Evaluation of Simplification), a text simplification dataset targeted at non-native Japanese speakers, according to public vocabulary and grammar profiles. JADES comprises 3,907 complex-simple sentence pairs annotated by an expert. Analysis of JADES shows that wide and multiple rewriting operations were applied through simplification. Furthermore, we analyzed outputs on JADES from several benchmark systems and automatic and manual scores of them. Results of these analyses highlight differences between English and Japanese in operations and evaluations.
pdf
abs
A Benchmark for Neural Readability Assessment of Texts in Spanish
Laura Vásquez-Rodríguez
|
Pedro-Manuel Cuenca-Jiménez
|
Sergio Morales-Esquivel
|
Fernando Alva-Manchego
We release a new benchmark for Automated Readability Assessment (ARA) of texts in Spanish. We combined existing corpora with suitable texts collected from the Web, thus creating the largest available dataset for ARA of Spanish texts. All data was pre-processed and categorised to allow experimenting with ARA models that make predictions at two (simple and complex) or three (basic, intermediate, and advanced) readability levels, and at two text granularities (paragraphs and sentences). An analysis based on readability indices shows that our proposed datasets groupings are suitable for their designated readability level. We use our benchmark to train neural ARA models based on BERT in zero-shot, few-shot, and cross-lingual settings. Results show that either a monolingual or multilingual pre-trained model can achieve good results when fine-tuned in language-specific data. In addition, all mod- els decrease their performance when predicting three classes instead of two, showing opportunities for the development of better ARA models for Spanish with existing resources.
pdf
abs
Controllable Lexical Simplification for English
Kim Cheng Sheang
|
Daniel Ferrés
|
Horacio Saggion
Fine-tuning Transformer-based approaches have recently shown exciting results on sentence simplification task. However, so far, no research has applied similar approaches to the Lexical Simplification (LS) task. In this paper, we present ConLS, a Controllable Lexical Simplification system fine-tuned with T5 (a Transformer-based model pre-trained with a BERT-style approach and several other tasks). The evaluation results on three datasets (LexMTurk, BenchLS, and NNSeval) have shown that our model performs comparable to LSBert (the current state-of-the-art) and even outperforms it in some cases. We also conducted a detailed comparison on the effectiveness of control tokens to give a clear view of how each token contributes to the model.
pdf
abs
CILS at TSAR-2022 Shared Task: Investigating the Applicability of Lexical Substitution Methods for Lexical Simplification
Sandaru Seneviratne
|
Elena Daskalaki
|
Hanna Suominen
Lexical simplification — which aims to simplify complex text through the replacement of difficult words using simpler alternatives while maintaining the meaning of the given text — is popular as a way of improving text accessibility for both people and computers. First, lexical simplification through substitution can improve the understandability of complex text for, for example, non-native speakers, second language learners, and people with low literacy. Second, its usefulness has been demonstrated in many natural language processing problems like data augmentation, paraphrase generation, or word sense induction. In this paper, we investigated the applicability of existing unsupervised lexical substitution methods based on pre-trained contextual embedding models and WordNet, which incorporate Context Information, for Lexical Simplification (CILS). Although the performance of this CILS approach has been outstanding in lexical substitution tasks, its usefulness was limited at the TSAR-2022 shared task on lexical simplification. Consequently, a minimally supervised approach with careful tuning to a given simplification task may work better than unsupervised methods. Our investigation also encouraged further work on evaluating the simplicity of potential candidates and incorporating them into the lexical simplification methods.
pdf
abs
PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages
Peniel Whistely
|
Sandeep Mathias
|
Galiveeti Poornima
In this paper, we describe our approach to generate and rank candidate simplifications using pre-trained language models (Eg. BERT), publicly available word embeddings (Eg. FastText), and a part-of-speech tagger, to generate and rank candidate contextual simplifications for a given complex word. In this task, our system, PresiUniv, was placed first in the Spanish track, 5th in the Brazilian-Portuguese track, and 10th in the English track. We upload our codes and data for this project to aid in replication of our results. We also analyze some of the errors and describe design decisions which we took while writing the paper.
pdf
abs
UoM&MMU at TSAR-2022 Shared Task: Prompt Learning for Lexical Simplification
Laura Vásquez-Rodríguez
|
Nhung Nguyen
|
Matthew Shardlow
|
Sophia Ananiadou
We present PromptLS, a method for fine-tuning large pre-trained Language Models (LM) to perform the task of Lexical Simplification. We use a predefined template to attain appropriate replacements for a term, and fine-tune a LM using this template on language specific datasets. We filter candidate lists in post-processing to improve accuracy. We demonstrate that our model can work in a) a zero shot setting (where we only require a pre-trained LM), b) a fine-tuned setting (where language-specific data is required), and c) a multilingual setting (where the model is pre-trained across multiple languages and fine-tuned in an specific language). Experimental results show that, although the zero-shot setting is competitive, its performance is still far from the fine-tuned setting. Also, the multilingual is unsurprisingly worse than the fine-tuned model. Among all TSAR-2022 Shared Task participants, our team was ranked second in Spanish and third in English.
pdf
abs
PolyU-CBS at TSAR-2022 Shared Task: A Simple, Rank-Based Method for Complex Word Substitution in Two Steps
Emmanuele Chersoni
|
Yu-Yin Hsu
In this paper, we describe the system we presented at the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) regarding the shared task on Lexical Simplification for English, Portuguese, and Spanish. We proposed an unsupervised approach in two steps: First, we used a masked language model with word masking for each language to extract possible candidates for the replacement of a difficult word; second, we ranked the candidates according to three different Transformer-based metrics. Finally, we determined our list of candidates based on the lowest average rank across different metrics.
pdf
abs
CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification?
Rodrigo Wilkens
|
David Alfter
|
Rémi Cardon
|
Isabelle Gribomont
|
Adrien Bibal
|
Watrin Patrick
|
Marie-Catherine De marneffe
|
Thomas François
Lexical simplification is the task of substituting a difficult word with a simpler equivalent for a target audience. This is currently commonly done by modeling lexical complexity on a continuous scale to identify simpler alternatives to difficult words. In the TSAR shared task, the organizers call for systems capable of generating substitutions in a zero-shot-task context, for English, Spanish and Portuguese. In this paper, we present the solution we (the cental team) proposed for the task. We explore the ability of BERT-like models to generate substitution words by masking the difficult word. To do so, we investigate various context enhancement strategies, that we combined into an ensemble method. We also explore different substitution ranking methods. We report on a post-submission analysis of the results and present our insights for potential improvements. The code for all our experiments is available at
https://gitlab.com/Cental-FR/cental-tsar2022.
pdf
abs
teamPN at TSAR-2022 Shared Task: Lexical Simplification using Multi-Level and Modular Approach
Nikita Nikita
|
Pawan Rajpoot
Lexical Simplification is the process of reducing the lexical complexity of a text by replacing difficult words with easier-to-read (or understand) expressions while preserving the original information and meaning. This paper explains the work done by our team “teamPN” for the English track of TSAR 2022 Shared Task of Lexical Simplification. We created a modular pipeline which combines transformers based models with traditional NLP methods like paraphrasing and verb sense disambiguation. We created a multi-level and modular pipeline where the target text is treated according to its semantics (Part of Speech Tag). The pipeline is multi-level as we utilize multiple source models to find potential candidates for replacement. It is modular as we can switch the source models and their weighting in the final re-ranking.
pdf
abs
MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders
Xiaofei Li
|
Daniel Wiechmann
|
Yu Qiao
|
Elma Kerz
In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability. Our approach builds on and extends the unsupervised lexical simplification system with pretrained encoders (LSBert) system introduced in Qiang et al. (2020) in the following ways: For the subtask of simplification candidate selection, it utilizes a RoBERTa transformer language model and expands the size of the generated candidate list. For subsequent substitution ranking, it introduces a new feature weighting scheme and adopts a candidate filtering method based on textual entailment to maximize semantic similarity between the target word and its simplification. Our best-performing system improves LSBert by 5.9% accuracy and achieves second place out of 33 ranked solutions.
pdf
abs
UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?
Dennis Aumiller
|
Michael Gertz
Previous state-of-the-art models for lexical simplification consist of complex pipelines with several components, each of which requires deep technical knowledge and fine-tuned interaction to achieve its full potential. As an alternative, we describe a frustratingly simple pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instances. Our best-performing submission to the English language track of the TSAR-2022 shared task consists of an “ensemble” of six different prompt templates with varying context levels. As a late-breaking result, we further detail a language transfer technique that allows simplification in languages other than English. Applied to the Spanish and Portuguese subset, we achieve state-of-the-art results with only minor modification to the original prompts. Aside from detailing the implementation and setup, we spend the remainder of this work discussing the particularities of prompting and implications for future work. Code for the experiments is available online at
https://github.com/dennlinger/TSAR-2022-Shared-Task.
pdf
abs
RCML at TSAR-2022 Shared Task: Lexical Simplification With Modular Substitution Candidate Ranking
Desislava Aleksandrova
|
Olivier Brochu Dufour
This paper describes the lexical simplification system RCML submitted to the English language track of the TSAR-2022 Shared Task. The system leverages a pre-trained language model to generate contextually plausible substitution candidates which are then ranked according to their simplicity as well as their grammatical and semantic similarity to the target complex word. Our submissions secure 6th and 7th places out of 33, improving over the SOTA baseline for 27 out of the 51 metrics.
pdf
abs
GMU-WLV at TSAR-2022 Shared Task: Evaluating Lexical Simplification Models
Kai North
|
Alphaeus Dmonte
|
Tharindu Ranasinghe
|
Marcos Zampieri
This paper describes team GMU-WLV submission to the TSAR shared-task on multilingual lexical simplification. The goal of the task is to automatically provide a set of candidate substitutions for complex words in context. The organizers provided participants with ALEXSIS a manually annotated dataset with instances split between a small trial set with a dozen instances in each of the three languages of the competition (English, Portuguese, Spanish) and a test set with over 300 instances in the three aforementioned languages. To cope with the lack of training data, participants had to either use alternative data sources or pre-trained language models. We experimented with monolingual models: BERTimbau, ELECTRA, and RoBERTA-largeBNE. Our best system achieved 1st place out of sixteen systems for Portuguese, 8th out of thirty-three systems for English, and 6th out of twelve systems for Spanish.
pdf
abs
Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification
Horacio Saggion
|
Sanja Štajner
|
Daniel Ferrés
|
Kim Cheng Sheang
|
Matthew Shardlow
|
Kai North
|
Marcos Zampieri
We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish. A total of 14 teams submitted the results of their lexical simplification systems for the provided test data. Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.