Rojin Ziaei

2026

Segmentation Strategy Matters: Benchmarking Whisper on Persian YouTube Content
Reihaneh Iranmanesh | Rojin Ziaei | Joe Garman
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family

Automatic Speech Recognition (ASR) transcription accuracy remains highly sensitive to audio segmentation strategies, yet most benchmarks assume oracle timestamps unavailable in deployment. We systematically evaluate how audio segmentation affects Whisper’s performance on 10 hours of Persian YouTube content, comparing transcript-aligned (oracle) versus silence-based (realistic) approaches across contrasting acoustic conditions. Results reveal striking content-type dependency: podcast content benefits from timestamp segmentation (33% lower mean WER), while entertainment content favors silence-based segmentation (8% lower mean WER). This finding demonstrates that optimal segmentation must be content-aware, with silence detection better capturing natural boundaries in acoustically heterogeneous media while avoiding mid-utterance splits. We publicly release our evaluation framework, 10 hours of audio with gold transcripts, and segmentation results here: https://github.com/ri164-bolleit/persian-youtube-whisper-benchmark

pdf bib abs

GUNLP at SemEval-2026 Task 10: Psycholinguistic Conspiracy Marker Extraction and Detection (PsyCoMark)
Rojin Ziaei | Mahsa Khoshnoodi | Nazli Goharian
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper presents the Georgetown University NLP (GUNLP) system developed for SemEval 2026 Task 10: Psycholinguistic Conspiracy Marker Extraction and Detection, addressing the classification of conspiratorial beliefs in Reddit posts (Subtask 2). Our approach leverages COVID-Twitter-BERT v2 (CT-BERT-v2) within a multi-task learning framework that jointly optimizes conspiracy classification and emotion label prediction through a dual-head architecture. To address data scarcity, we enrich the training set using paraphrasing-based data augmentation and GPT-5-generated chain-of-thought emotion annotations, effectively doubling the training corpus to approximately 8,600 examples. We evaluate two input configurations: text only and text concatenated with emotion labels. The emotion-aware configuration achieves the strongest performance with an F1 score of 0.87 on the official development set, outperforming the text-only baseline by five F1 points and demonstrating the value of paraphrased samples and affective auxiliary supervision for conspiracy detection in social media text.

Co-authors

Venues

Fix author