Simon Levis Sullam


2022

pdf
Evaluating Multilingual Sentence Representation Models in a Real Case Scenario
Rocco Tripodi | Rexhina Blloshmi | Simon Levis Sullam
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we present an evaluation of sentence representation models on the paraphrase detection task. The evaluation is designed to simulate a real-world problem of plagiarism and is based on one of the most important cases of forgery in modern history: the so-called “Protocols of the Elders of Zion”. The sentence pairs for the evaluation are taken from the infamous forged text “Protocols of the Elders of Zion” (Protocols) by unknown authors; and by “Dialogue in Hell between Machiavelli and Montesquieu” by Maurice Joly. Scholars have demonstrated that the first text plagiarizes from the second, indicating all the forged parts on qualitative grounds. Following this evidence, we organized the rephrased texts and asked native speakers to quantify the level of similarity between each pair. We used this material to evaluate sentence representation models in two languages: English and French, and on three tasks: similarity correlation, paraphrase identification, and paraphrase retrieval. Our evaluation aims at encouraging the development of benchmarks based on real-world problems, as a means to prevent problems connected to AI hypes, and to use NLP technologies for social good. Through our evaluation, we are able to confirm that the infamous Protocols are actually a plagiarized text but, as we will show, we encounter several problems connected with the convoluted nature of the task, that is very different from the one reported in standard benchmarks of paraphrase detection and sentence similarity. Code and data available at https://github.com/roccotrip/protocols.

2019

pdf
Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914
Rocco Tripodi | Massimo Warglien | Simon Levis Sullam | Deborah Paci
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We investigate some aspects of the history of antisemitism in France, one of the cradles of modern antisemitism, using diachronic word embeddings. We constructed a large corpus of French books and periodicals issues that contain a keyword related to Jews and performed a diachronic word embedding over the 1789-1914 period. We studied the changes over time in the semantic spaces of 4 target words and performed embedding projections over 6 streams of antisemitic discourse. This allowed us to track the evolution of antisemitic bias in the religious, economic, socio-politic, racial, ethic and conspiratorial domains. Projections show a trend of growing antisemitism, especially in the years starting in the mid-80s and culminating in the Dreyfus affair. Our analysis also allows us to highlight the peculiar adverse bias towards Judaism in the broader context of other religions.