Nikita Tatarinov
2026
Language Modeling for the Future of Finance: A Survey into Metrics, Tasks, and Data Opportunities
Nikita Tatarinov | Siddhant Sukhani | Agam Shah | Sudheer Chava
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Nikita Tatarinov | Siddhant Sukhani | Agam Shah | Sudheer Chava
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Recent advances in language modeling have led to a growing number of papers related to finance in top-tier Natural Language Processing (NLP) venues. To systematically examine this trend, we review 374 NLP research papers published between 2017 and 2024 across 38 conferences and workshops, with a focused analysis of 221 papers that directly address finance-related tasks. We evaluate these papers across 11 quantitative and qualitative dimensions, with particular attention to evaluation practices, metric choices, dataset coverage, and reproducibility in a high-stakes applied LM domain. Our study identifies the following opportunities for NLP researchers: (i) expanding the scope of forecasting tasks; (ii) enriching evaluation with finance-specific metrics; (iii) leveraging multilingual and crisis-period datasets for robustness-oriented evaluation; and (iv) balancing PLMs with efficient or interpretable alternatives. We identify actionable directions supported by dataset and tool recommendations, with implications for both academic evaluation practices and industry deployment.
KG-MuLQA: A Framework for KG-based Multi-Level QA Extraction and Long-Context LLM Evaluation
Nikita Tatarinov | Vidhyakshaya Kannan | Haricharana Srinivasa | Arnav Raj | Harpreet Singh Anand | Varun Singh | Aditya Luthra | Ravij Lade | Agam Shah | Sudheer Chava
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Nikita Tatarinov | Vidhyakshaya Kannan | Haricharana Srinivasa | Arnav Raj | Harpreet Singh Anand | Varun Singh | Aditya Luthra | Ravij Lade | Agam Shah | Sudheer Chava
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce KG-MuLQA (Knowledge-Graph-based Multi-Level Question-Answer Extraction): a framework that (1) extracts QA pairs at multiple complexity levels (2) along three key dimensions – multi-hop retrieval, set operations, and answer plurality, (3) by leveraging knowledge-graph-based document representations. This approach enables fine-grained assessment of model performance across controlled difficulty levels. Using this framework, we construct a dataset of 20,139 QA pairs based on financial credit agreements and evaluate 16 proprietary and open-weight Large Language Models, observing that even the best-performing models struggle with set-based comparisons and multi-hop reasoning over long contexts. Our analysis reveals systematic failure modes tied to semantic misinterpretation and inability to handle implicit relations.