Samyak Rajesh Jain
Also published as: Samyak Rajesh Jain
2026
Reasoning Graph-Structured Question Answering: Datasets and Insights from LLM Benchmarking
Khin Yone | Devasha Trivedi | Anish Pahilajani | Jincen Shuai | Samyak Rajesh Jain | Ryan Rossi | Nesreen K. Ahmed | Franck Dernoncourt | Yu Wang | Namyong Park
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Khin Yone | Devasha Trivedi | Anish Pahilajani | Jincen Shuai | Samyak Rajesh Jain | Ryan Rossi | Nesreen K. Ahmed | Franck Dernoncourt | Yu Wang | Namyong Park
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Large Language Models (LLMs) have shown remarkable success in multi-hop question-answering (M-QA) due to their advanced reasoning capabilities. However, the influence of reasoning structures on their performance remains underexplored, primarily due to the lack of M-QA datasets that explicitly encode the reasoning pathways underlying each question-answer pair. To address this gap, we introduce the reasoning graph-structured question answering dataset (GRS-QA), which provides both semantic contexts and reasoning structures for the QA pairs. Unlike existing M-QA datasets, GRS-QA explicitly captures intricate reasoning pathways through reasoning graphs, where nodes correspond to textual contexts and edges denote logical flows. Using GRS-QA, we systematically evaluate LLM performance across varying context structures, prompting styles, and data domains. Our empirical analysis reveals that LLMs perform differently based on the reasoning structure, context, and prompting styles, indicating their varying ability to leverage graph-structured knowledge. Notably, providing explicit reasoning guidance proves more effective than supplying contextual information alone.
2025
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
Vipula Rawte | Sarthak Jain | Aarush Sinha | Garv Kaushik | Aman Bansal | Prathiksha Rumale Vishwanath | Samyak Rajesh Jain | Aishwarya Naresh Reganti | Vinija Jain | Aman Chadha | Amit Sheth | Amitava Das
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Vipula Rawte | Sarthak Jain | Aarush Sinha | Garv Kaushik | Aman Bansal | Prathiksha Rumale Vishwanath | Samyak Rajesh Jain | Aishwarya Naresh Reganti | Vinija Jain | Aman Chadha | Amit Sheth | Amitava Das
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Recent advances in Large Multimodal Models (LMMs) have expanded their capabilities to video understanding, with Text-to-Video (T2V) models excelling in generating videos from textual prompts. However, they still frequently produce hallucinated content, revealing AI-generated inconsistencies. We introduce ViBe https://huggingface.co/datasets/ViBe-T2V-Bench/ViBe: a large-scale dataset of hallucinated videos from open-source T2V models. We identify five major hallucination types: Vanishing Subject, Omission Error, Numeric Variability, Subject Dysmorphia, and Visual Incongruity. Using ten T2V models, we generated and manually annotated 3,782 videos from 837 diverse MS COCO captions. Our proposed benchmark includes a dataset of hallucinated videos and a classification framework using video embeddings. ViBe serves as a critical resource for evaluating T2V reliability and advancing hallucination detection. We establish classification as a baseline, with the TimeSFormer + CNN ensemble achieving the best performance (0.345 accuracy, 0.342 F1 score). While initial baselines proposed achieve modest accuracy, this highlights the difficulty of automated hallucination detection and the need for improved methods. Our research aims to drive the development of more robust T2V models and evaluate their outputs based on user preferences. Our code is available at: https://anonymous.4open.science/r/vibe-1840/