Monorama Swain
2026
H-RAG at SemEval-2026 Task 8: Hierarchical Parent–Child Retrieval for Multi-Turn RAG Conversations
Passant Elchafei | Hossam Emam | Mohamed Alansary | Monorama Swain | Markus Schedl
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Passant Elchafei | Hossam Emam | Mohamed Alansary | Monorama Swain | Markus Schedl
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages). Task A evaluates standalone retrieval quality, while Task C assesses end-to-end retrieval-augmented generation (RAG) in multi-turn conversational settings, requiring both accurate answer generation and faithful grounding in retrieved evidence. Our approach implements a hierarchical parent–child RAG pipeline that separates fine-grained child-level retrieval from parent-level context reconstruction during generation. Documents are segmented into overlapping sentence-based child chunks, while full documents are preserved as parent units to provide coherent context. weighting, and embedding-based similarity rescoring over child chunks. Retrieved evidence is aggregated at the parent level and supplied to an instruction-tuned language model for response generation. H-RAG achieves an nDCG@5 score of 0.4271 on Task A and a harmonic mean score of 0.3241 on Task C (RBagg: 0.2488, RLF: 0.2703, RBllm: 0.6508), underscoring the importance of retrieval configuration and parent-level aggregation in multi-turn RAG performance.
2024
On Mitigating Performance Disparities in Multilingual Speech Recognition
Monorama Swain | Anna Katrine Van Zee | Anders Søgaard
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Monorama Swain | Anna Katrine Van Zee | Anders Søgaard
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
How far have we come in mitigating performance disparities across genders in multilingual speech recognition? We compare the impact on gender disparity of different fine-tuning algorithms for automated speech recognition across model sizes, languages and gender. We look at both performance-focused and fairness-promoting algorithms. Across languages, we see slightly better performance for female speakers for larger models regardless of the fine-tuning algorithm. The best trade-off between performance and parity is found using adapter fusion. Fairness-promoting fine-tuning algorithms (Group-DRO and Spectral Decoupling) hurt performance compared to adapter fusion with only slightly better performance parity. LoRA increases disparities slightly. Fairness-mitigating fine-tuning techniques led to slightly higher variance in performance across languages, with the exception of adapter fusion.