Niloufar Beyranvand


2025

pdf bib
GEAR: A Scalable and Interpretable Evaluation Framework for RAG-Based Car Assistant Systems
Niloufar Beyranvand | Hamidreza Dastmalchi | Aijun An | Heidar Davoudi | Winston Chan | Ron DiCarlantonio
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large language models (LLMs) increasingly power car assistants, enabling natural language interaction for tasks such as maintenance, troubleshooting, and operational guidance. While retrieval-augmented generation (RAG) improves grounding using vehicle manuals, evaluating response quality remains a key challenge. Traditional metrics like BLEU and ROUGE fail to capture critical aspects such as factual accuracy and information coverage. We propose GEAR, a fully automated, reference-based evaluation framework for car assistant systems. GEAR uses LLMs as evaluators to compare assistant responses against ground-truth counterparts, assessing coverage, correctness, and other dimensions of answer quality. To enable fine-grained evaluation, both responses are decomposed into key facts and labeled as essential, optional, or safety-critical using LLMs. The evaluator then determines which of these facts are correct and covered. Experiments show that GEAR aligns closely with human annotations, offering a scalable and reliable solution for evaluating car assistants.

2022

pdf bib
ParsSimpleQA: The Persian Simple Question Answering Dataset and System over Knowledge Graph
Hamed Babaei Giglou | Niloufar Beyranvand | Reza Moradi | Amir Mohammad Salehoof | Saeed Bibak
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities

The simple question answering over the knowledge graph concerns answering single-relation questions by querying the facts in the knowledge graph. This task has drawn significant attention in recent years. However, there is a demand for a simple question dataset in the Persian language to study open-domain simple question answering. In this paper, we present the first Persian single-relation question answering dataset and a model that uses a knowledge graph as a source of knowledge to answer questions. We create the ParsSimpleQA dataset semi-automatically in two steps. First, we build single-relation question templates. Next, we automatically create simple questions and answers using templates, entities, and relations from Farsbase. To present the reliability of the presented dataset, we proposed a simple question-answering system that receives questions and uses deep learning and information retrieval techniques for answering questions. The experimental results presented in this paper show that the ParsSimpleQA dataset is very promising for the Persian simple question-answering task.