Ștefana-Arina Tăbușcă

Also published as: Ștefana Arina Tăbușcă, Stefana Arina Tabusca

2026

The Visibility of Depression in Social Media: Mapping Symptoms to Linguistic Features
Ștefana-Arina Tăbușcă | Ana Sabina Uban | Liviu Dinu
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)

Digital phenotyping research assumes that depression symptoms are detectable in people’s written discourse, yet there is room to explore which specific symptoms leave linguistic traces and which remain invisible. In this paper, using matched clinical and social media data from 169 Reddit users (eRisk 2021), we construct a clinical symptom network from BDI-II responses and a symptom-language bridge matrix mapping each of the 21 BDI-II symptoms to 15 curated LIWC-22 linguistic features. After FDR correction, 37 significant associations emerge, revealing a divide between cognitive-affective symptoms (sadness, worthlessness, suicidality) that leave clear linguistic traces through mental health vocabulary, anxiety words, and first-person pronouns, while others, like vegetative symptoms (sleep, appetite, irritability, libido) appear less visible. These findings suggest that there might be dimensions of depression that are missed by text-based depression monitoring.

2025

pdf bib abs

Arabic to Romanian Machine Translation: A Case Study on Distant Language Pairs
Ioan Alexandru Hirica | Stefana Arina Tabusca | Sergiu Nisioi
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This paper investigates machine translation between two linguistically distant languages, Arabic and Romanian, with a focus on translating from Arabic to Romanian. Dataset cleaning techniques are addressed, offering insights on the impact of translation for a language pair with limited resources. Using publicly available corpora (e.g., OPUS) and manually translated diplomatic texts, filtering methods are applied, such as duplicate removal, embedding similarity analysis (LEALLA), and Large Language Model (LLM)-based validation (Gemini-flash-002). Transformer models are trained and evaluated with diverse preprocessing pipelines that incorporate subword tokenization. Additionally, the performance of a fine-tuned LLM is assessed for this task and is compared to their pre-trained counterparts. Despite computational limitations, the results emphasize the importance of targeted preprocessing and model adaptation in improving Arabic-Romanian translation quality.

pdf bib abs

Optimism, Pessimism, and the Language between: Model Interpretability and Psycholinguistic Profiling
Stefana Arina Tabusca | Liviu P. Dinu
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This study explores how optimism and pessimism are expressed in social media by combining psycholinguistic profiling with model interpretability. Using the OPT dataset, we fine-tune a RoBERTa-based classifier and apply LIME to examine both the most confident and the most ambiguous predictions. We analyze the influential tokens driving these decisions and identify lexical patterns linked to affective intensity, certainty, and social orientation. A complementary LIWC-based analysis of ground truth labels reveals systematic differences in emotional tone and cognitive style. PCA projections further show that optimism and pessimism occupy overlapping yet distinguishable regions in psycholinguistic space. Our findings demonstrate the value of linguistic interpretability in understanding dispositional sentiment.

pdf bib abs

ReproHum #0033-05: Human Evaluation of Factuality from A Multidisciplinary Perspective
Andra-Maria Florescu | Marius Micluța-Câmpeanu | Stefana Arina Tabusca | Liviu P Dinu
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

The following paper is a joint contribution for the 2025 ReproNLP shared task, part of the ReproHum project. We focused on reproducing the human evaluation based on one criterion, namely, factuality of Scientific Automated Generated Systems from August et al. (2022). In accordance to the ReproHum guidelines, we followed the original study as closely as possible, with two human raters who coded 300 ratings each. Moreover, we had an additional study on two subsets of the dataset based on domain (medicine and physics) in which we employed expert annotators. Our reproduction of the factuality assessment found similar overall rates of factual inaccuracies across models. However, variability and weak agreement with the original model rankings suggest challenges in reliably reproducing results, especially in such cases when results are close.

pdf bib

Dissonant Ballerinas and Crafty Carrots: A Comparative Multi-modal Analysis of Italian Brain Rot
Anca Dinu | Andra-Maria Florescu | Marius Micluța-Câmpeanu | Ștefana Arina Tăbușcă | Claudiu Creangă | Andreiana Mihail
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Co-authors

Ioan Alexandru Hirica 1

Andreiana Mihail 1

Sergiu Nisioi 1

Ana Sabina Uban 1

Venues

Fix author