Süveyda Yeniterzi

Also published as: Suveyda Yeniterzi


2026

This paper describes our submission to SemEval-2026 Task 8 on multi-turn retrieval-augmented generation (RAG). We propose a hybrid multi-stage pipeline that combines high-recall lexical retrieval, dual-embedding dense re-ranking with reciprocal rank fusion, LLM-based relevance judging, and strictly constrained evidence-grounded generation. Our design emphasizes robustness and faithfulness across the full retrieval-to-generation pipeline. Our results suggest that relevance-aware filtering and constrained generation are important for improving faithfulness and overall RAG performance.
This paper describes the GenAIus submission to RAG4Reports 2026 Multilingual Report Generation Task. Our system builds on our earlier TREC RAGTIME pipeline, reusing the evidence preparation stages for overlapping topics, including question generation, multilingual retrieval, nugget generation, and nugget clustering. For RAG4Reports, we focused on the final generation stage and tested a citation-aware compression strategy: generating the long report first from clustered evidence nuggets and then deriving the short report from it, rather than generating both length conditions independently. Our baseline run, which followed the original TREC-style setup, ranked third overall. Our best run, genaius-cluster-gpt4, ranked second overall with an F1 score of 0.5456, achieving the best balance among our submissions between nugget coverage and sentence support. The results suggest that citation-aware compression is a promising strategy for length-constrained, citation-grounded report generation.

2020

This paper summarizes our group’s efforts in the event sentence coreference identification shared task, which is organized as part of the Automated Extraction of Socio-Political Events from News (AESPEN) Workshop. Our main approach consists of three steps. We initially use a transformer based model to predict whether a pair of sentences refer to the same event or not. Later, we use these predictions as the initial scores and recalculate the pair scores by considering the relation of sentences in a pair with respect to other sentences. As the last step, final scores between these sentences are used to construct the clusters, starting with the pairs with the highest scores. Our proposed approach outperforms the baseline approach across all evaluation metrics.