Rynaa Grover
2025
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
Sahiti Yerramilli
|
Nilay Pande
|
Rynaa Grover
|
Jayant Sravan Tamarapalli
Findings of the Association for Computational Linguistics: EMNLP 2025
This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization across four reasoning categories - visual, spatial, cultural, and precise geolocation - annotated by difficulty. Images are also enriched with semantic segmentation (150 classes) and a visual locatability score. Our benchmarking of frontier MLLMs on a diverse 2,088-image subset reveals consistent challenges: models frequently exhibit weaknesses in visual grounding, display erratic reasoning, and struggle to achieve accurate localization, especially as the reasoning complexity escalates. GeoChain offers a robust diagnostic methodology, critical for fostering significant advancements in complex geographic reasoning within MLLMs.
2024
A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech
Gaurav Verma
|
Rynaa Grover
|
Jiawei Zhou
|
Binny Mathew
|
Jordan Kraemer
|
Munmun Choudhury
|
Srijan Kumar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Violence-provoking speech – speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the COVID-19 pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-Asian violence-provoking speech. Using data from ~420k Twitter posts spanning a 3-year duration (January 1, 2020 to February 1, 2023), we develop a codebook to characterize anti-Asian violence-provoking speech and collect a community-crowdsourced dataset to facilitate its large-scale detection using state-of-the-art classifiers. We contrast the capabilities of natural language processing classifiers, ranging from BERT-based to LLM-based classifiers, in detecting violence-provoking speech with their capabilities to detect anti-Asian hateful speech. In contrast to prior work that has demonstrated the effectiveness of such classifiers in detecting hateful speech (F1 = 0.89), our work shows that accurate and reliable detection of violence-provoking speech is a challenging task (F1 = 0.69). We discuss the implications of our findings, particularly the need for proactive interventions to support Asian communities during public health crises.
Search
Fix author
Co-authors
- Munmun Choudhury 1
- Jordan Kraemer 1
- Srijan Kumar 1
- Binny Mathew 1
- Nilay Pande 1
- show all...