Monjoy Narayan Choudhury
2025
Can Vision-Language Models Solve Visual Math Equations?
Monjoy Narayan Choudhury
|
Junling Wang
|
Yifan Hou
|
Mrinmaya Sachan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Despite strong performance in visual understanding and language-based reasoning, Vision-Language Models (VLMs) struggle with tasks requiring integrated perception and symbolic computation. We study this limitation through visual equation solving, where mathematical equations are embedded in images, variables are represented by object icons, and coefficients must be inferred by counting. While VLMs perform well on textual equations, they fail on visually grounded counterparts. To understand this gap, we decompose the task into coefficient counting and variable recognition, and find that counting is the primary bottleneck, even when recognition is accurate. We also observe that composing recognition and reasoning introduces additional errors, highlighting challenges in multi-step visual reasoning. Finally, as equation complexity increases, symbolic reasoning itself becomes a limiting factor. These findings reveal key weaknesses in current VLMs and point toward future improvements in visually grounded mathematical reasoning.
2024
CASE: Efficient Curricular Data Pre-training for Building Assistive Psychology Expert Models
Sarthak Harne
|
Monjoy Narayan Choudhury
|
Madhav Rao
|
T K Srikanth
|
Seema Mehrotra
|
Apoorva Vashisht
|
Aarushi Basu
|
Manjit Singh Sodhi
Findings of the Association for Computational Linguistics: EMNLP 2024
The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial challenge in this domain is data privacy and scarcity. To address this, we propose utilizing readily available curricular texts used in institutes specializing in mental health for pre-training the NLP pipelines. This helps us mimic the training process of a psychologist. Our work presents CASE-BERT that flags potential mental health disorders based on forum text. CASE-BERT demonstrates superior performance compared to existing methods, achieving an f1 score of 0.91 for Depression and 0.88 for Anxiety, two of the most commonly reported mental health disorders. Our code and data are publicly available.
Search
Fix author
Co-authors
- Aarushi Basu 1
- Sarthak Harne 1
- Yifan Hou 1
- Seema Mehrotra 1
- Madhav Rao 1
- show all...