Tariq Yousef


An automatic model and Gold Standard for translation alignment of Ancient Greek
Tariq Yousef | Chiara Palladino | Farnoosh Shamsian | Anise d’Orange Ferreira | Michel Ferreira dos Reis
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper illustrates a workflow for developing and evaluating automatic translation alignment models for Ancient Greek. We designed an annotation Style Guide and a gold standard for the alignment of Ancient Greek-English and Ancient Greek-Portuguese, measured inter-annotator agreement and used the resulting dataset to evaluate the performance of various translation alignment models. We proposed a fine-tuning strategy that employs unsupervised training with mono- and bilingual texts and supervised training using manually aligned sentences. The results indicate that the fine-tuned model based on XLM-Roberta is superior in performance, and it achieved good results on language pairs that were not part of the training data.

Automatic Translation Alignment for Ancient Greek and Latin
Tariq Yousef | Chiara Palladino | David J. Wright | Monica Berti
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper presents the results of automatic translation alignment experiments on a corpus of texts in Ancient Greek translated into Latin. We used a state-of-the-art alignment workflow based on a contextualized multilingual language model that is fine-tuned on the alignment task for Ancient Greek and Latin. The performance of the alignment model is evaluated on an alignment gold standard consisting of 100 parallel fragments aligned manually by two domain experts, with a 90.5% Inter-Annotator-Agreement (IAA). An interactive online interface is provided to enable users to explore the aligned fragments collection and examine the alignment model’s output.


Press Freedom Monitor: Detection of Reported Press and Media Freedom Violations in Twitter and News Articles
Tariq Yousef | Antje Schlaf | Janos Borst | Andreas Niekler | Gerhard Heyer
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Freedom of the press and media is of vital importance for democratically organised states and open societies. We introduce the Press Freedom Monitor, a tool that aims to detect reported press and media freedom violations in news articles and tweets. It is used by press and media freedom organisations to support their daily monitoring and to trigger rapid response actions. The Press Freedom Monitor enables the monitoring experts to get a fast overview over recently reported incidents and it has shown an impressive performance in this regard. This paper presents our work on the tool, starting with the training phase, which comprises defining the topic-related keywords to be used for querying APIs for news and Twitter content and evaluating different machine learning models based on a training dataset specifically created for our use case. Then, we describe the components of the production pipeline, including data gathering, duplicates removal, country mapping, case mapping and the user interface. We also conducted a usability study to evaluate the effectiveness of the user interface, and describe improvement plans for future work.

Summary Explorer: Visualizing the State of the Art in Text Summarization
Shahbaz Syed | Tariq Yousef | Khalid Al Khatib | Stefan Jänicke | Martin Potthast
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

This paper introduces Summary Explorer, a new tool to support the manual inspection of text summarization systems by compiling the outputs of 55 state-of-the-art single document summarization approaches on three benchmark datasets, and visually exploring them during a qualitative assessment. The underlying design of the tool considers three well-known summary quality criteria (coverage, faithfulness, and position bias), encapsulated in a guided assessment based on tailored visualizations. The tool complements existing approaches for locally debugging summarization models and improves upon them. The tool is available at https://tldr.webis.de/


Ancient Greek WordNet Meets the Dynamic Lexicon: the Example of the Fragments of the Greek Historians
Monica Berti | Yuri Bizzoni | Federico Boschetti | Gregory R. Crane | Riccardo Del Gratta | Tariq Yousef
Proceedings of the 8th Global WordNet Conference (GWC)

The Ancient Greek WordNet (AGWN) and the Dynamic Lexicon (DL) are multilingual resources to study the lexicon of Ancient Greek texts and their translations. Both AGWN and DL are works in progress that need accuracy improvement and manual validation. After a detailed description of the current state of each work, this paper illustrates a methodology to cross AGWN and DL data, in order to mutually score the items of each resource according to the evidence provided by the other resource. The training data is based on the corpus of the Digital Fragmenta Historicorum Graecorum (DFHG), which includes ancient Greek texts with Latin translations.