Wondimagegnhue Tsegaye Tufa


2024

pdf bib
The Constant in HATE: Toxicity in Reddit across Topics and Languages
Wondimagegnhue Tsegaye Tufa | Ilia Markov | Piek T.J.M. Vossen
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024

Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages. By aligning languages with topics, we thoroughly analyze how toxicity spikes within different communities. Our analysis targets six languages spanning different communities and topics such as Culture, Politics, and News. We observe consistent patterns across languages where toxicity increases within the same topics while also identifying significant differences where specific language communities exhibit notable variations in relation to certain topics.

2019


English-Ethiopian Languages Statistical Machine Translation
Solomon Teferra Abate | Michael Melese | Martha Yifiru Tachbelie | Million Meshesha | Solomon Atinafu | Wondwossen Mulugeta | Yaregal Assabie | Hafte Abera | Biniyam Ephrem | Tewodros Gebreselassie | Wondimagegnhue Tsegaye Tufa | Amanuel Lemma | Tsegaye Andargie | Seifedin Shifaw
Proceedings of the 2019 Workshop on Widening NLP

In this paper, we describe an attempt towards the development of parallel corpora for English and Ethiopian Languages, such as Amharic, Tigrigna, Afan-Oromo, Wolaytta and Ge’ez. The corpora are used for conducting bi-directional SMT experiments. The BLEU scores of the bi-directional SMT systems show a promising result. The morphological richness of the Ethiopian languages has a great impact on the performance of SMT especially when the targets are Ethiopian languages.