Arnab Maji

2025

pdf bib abs
InFiNITE (∞): Indian Financial Narrative Inference Tasks & Evaluations
Sohom Ghosh | Arnab Maji | Sudip Kumar Naskar
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems

This paper introduces Indian Financial Narrative Inference Tasks and Evaluations (InFiNITE), a comprehensive framework for analyzing India’s financial narratives through three novel inference tasks. Firstly, we present multi-modal earnings call analysis by integrating transcripts, presentation visuals, and market indicators via the Multi-Modal Indian Earnings Calls (MiMIC) dataset, enabling holistic prediction of post-call stock movements. Secondly, our Budget-Assisted Sectoral Impact Ranking (BASIR) dataset aids in systematically decoding government fiscal narratives by classifying budget excerpts into 81 economic sectors and evaluating their post-announcement equity performance. Thirdly, we introduce Bharat IPO Rating (BIR) datasets to redefine Initial Public Offering (IPO) evaluation through prospectus analysis, classifying potential investments into four recommendation categories (Apply, May Apply, Neutral, Avoid). By unifying textual, visual, and quantitative modalities across corporate, governmental, and public investment domains, InFiNITE addresses critical gaps in Indian financial narrative analysis. The open source data sets of the framework, including earnings calls, union budgets, and IPO prospectuses, establish benchmark resources specific to India for computational economic research under permissive licenses. For investors, InFiNITE enables data-driven identification of capital allocation opportunities and IPO risks, while policymakers gain structured insights to assess Indian fiscal communication impacts. By releasing these datasets publicly, we aim to facilitate research in computational economics and financial text analysis, particularly for the Indian market.

2024

pdf bib abs
IndicFinNLP: Financial Natural Language Processing for Indian Languages
Sohom Ghosh | Arnab Maji | Aswartha Narayana | Sudip Kumar Naskar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Applications of Natural Language Processing (NLP) in the finance domain have been very popular of late. For financial NLP, (FinNLP) while various datasets exist for widely spoken languages like English and Chinese, datasets are scarce for low resource languages,particularly for Indian languages. In this paper, we address this challenges by presenting IndicFinNLP – a collection of 9 datasets consisting of three tasks relating to FinNLP for three Indian languages. These tasks are Exaggerated Numeral Detection, Sustainability Classification, and ESG Theme Determination of financial texts in Hindi, Bengali, and Telugu. Moreover, we release the datasets under CC BY-NC-SA 4.0 license for the benefit of the research community.

Co-authors

Venues

Fix author