Evren Ayberk Munis
2026
OCRTurk: A Comprehensive OCR Benchmark for Turkish
Deniz Yılmaz | Evren Ayberk Munis | Cagri Toraman | Süha Kağan Köse | Burak Aktaş | Mehmet Can Baytekin | Bilge Kaan Görür
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Deniz Yılmaz | Evren Ayberk Munis | Cagri Toraman | Süha Kağan Köse | Burak Aktaş | Mehmet Can Baytekin | Bilge Kaan Görür
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Document parsing is now widely used in applications, such as large-scale document digitization, retrieval-augmented generation, and domain-specific pipelines in healthcare and education. Benchmarking these models is crucial for assessing their reliability and practical robustness. Existing benchmarks mostly target high-resource languages and provide limited coverage for low-resource settings, such as Turkish. Moreover, existing studies on Turkish document parsing lack a standardized benchmark that reflects real-world scenarios and document diversity. To address this gap, we introduce OCRTurk, a Turkish document parsing benchmark covering multiple layout elements and document categories at three difficulty levels. OCRTurk consists of 180 Turkish documents drawn from academic articles, theses, slide decks, and non-academic articles. We evaluate seven OCR models on OCRTurk using element-wise metrics. Across difficulty levels, PaddleOCR achieves the strongest overall results, leading most element-wise metrics except figures and attaining the best Normalized Edit Distance scores in easy, medium, and hard subsets. We also observe performance variation by document type: models perform well on non-academic documents, while slideshows become the most challenging.
RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish
Süha Kağan Köse | Mehmet Can Baytekin | Burak Aktaş | Bilge Kaan Görür | Evren Ayberk Munis | Deniz Yılmaz | Muhammed Yusuf Kartal | Cagri Toraman
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Süha Kağan Köse | Mehmet Can Baytekin | Burak Aktaş | Bilge Kaan Görür | Evren Ayberk Munis | Deniz Yılmaz | Muhammed Yusuf Kartal | Cagri Toraman
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Retrieval-Augmented Generation (RAG) enhances LLM factuality, yet design guidance remains English-centric, limiting insights for morphologically rich languages like Turkish. We address this by constructing a comprehensive Turkish RAG dataset derived from Turkish Wikipedia and CulturaX, comprising question-answer pairs and relevant passage chunks. We benchmark seven stages of the RAG pipeline—from query transformation and reranking to answer refinement—without task-specific fine-tuning. Our results show that complex methods like HyDE maximize accuracy (85%) that is considerably higher than the baseline (78.70%). Also a Pareto-optimal configuration using Cross-encoder Reranking and Context Augmentation achieves comparable performance (84.60%) with much lower cost. We further demonstrate that over-stacking generative modules can degrade performance by distorting morphological cues, whereas simple query clarification with robust reranking offers an effective solution.
MALTO at SemEval-2026 Task 13: Detecting Human, AI, and Hybrid Code via Hard Negative Mining and Curriculum-Driven Ensembles
Hüseyin Arslan | Evren Ayberk Munis | Timofei Khudonogov | Mert Akgun | Murat Besli | Ayhan Meherrem | Claudio Savelli | Flavio Giobergia
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Hüseyin Arslan | Evren Ayberk Munis | Timofei Khudonogov | Mert Akgun | Murat Besli | Ayhan Meherrem | Claudio Savelli | Flavio Giobergia
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
The rapid advancement of Large Language Models (LLMs) has significantly impacted software engineering, posing challenges for determining the origin and authenticity of source code. This paper presents the MALTO team’s submission for SemEval-2026 Task 13, explicitly focusing on Subtask B (Authorship Attribution among 11 classes) and Subtask C (Hybrid Code Detection). To address severe class imbalance and the complex boundaries of mixed human-machine code, we propose a unified framework that leverages an ensemble of UniXcoder and CodeT5. Our approach integrates a robust Tree-sitter-based Universal Canonicalization strategy, Data Augmentation, and a novel 3-Phase Curriculum Training schedule enhanced by Hard Negative Mining. Specifically, UniXcoder’s cross-modal representations excel at distinguishing among semantically overlapping LLM families (Subtask B), whereas CodeT5’s identifier-aware architecture is superior at detecting subtle structural anomalies in hybrid and adversarial snippets (Subtask C). By aggregating these complementary strengths, our soft-voting ensemble overcomes the limitations of individual models, demonstrating strong robustness against imbalanced distributions and effectively discriminating between purely human, purely machine, hybrid, and adversarial code snippets.