Kemal Kirtac
2026
Evaluating Large Language Model News Sentiment in Finance under Liquidity and Market Frictions
Kemal Kirtac
Proceedings of the Workshop on Evaluating Evaluations (EvalEval)
Kemal Kirtac
Proceedings of the Workshop on Evaluating Evaluations (EvalEval)
This paper studies whether large language models can extract useful sentiment signals from firm-specific financial news when evaluation accounts for realistic market frictions. Many financial NLP studies report strong offline prediction results, but these do not always show whether model outputs remain useful once trading constraints are imposed. I address this gap by evaluating sentiment models through classification performance, return predictability, and implementable portfolio performance. The analysis links Refinitiv News Analytics to CRSP and begins with 3,129,924 U.S. news items published between January 1, 2010 and January 30, 2026. Filtering retains single-firm stories, removes redundant coverage using a five-day cosine-similarity novelty screen, and restricts the sample to tradable stocks with positive bid and ask quotes, minimum share and dollar volume thresholds, quoted spreads below 20%, and available Amihud illiquidity ratios and Kyle’s lambda estimates. The final sample contains 973,481 tradable news items linked to 3,452 firms. I compare six sentiment approaches: LLaMA–3, OPT, RoBERTa, BERT, FinBERT, and the Loughran–McDonald dictionary. LLaMA–3 achieves the strongest classification performance with 78.2% accuracy and produces the largest predictive coefficients in panel regressions. Daily rebalanced long–short portfolios with a 5 bps trading cost show that the LLaMA–3 strategy earns a cumulative return of approximately 180% from June 2024 to January 2026, followed by OPT with 155% and RoBERTa with 120%, while the dictionarybased strategy loses 9%. The results show that evaluation becomes more informative when financial NLP models are assessed beyond offline accuracy and under realistic deployment constraints. High-capacity language models retain economically meaningful predictive content under market frictions, whereas simpler lexicon-based methods do not.
2025
Leveraging LLM-based sentiment analysis for portfolio optimization with proximal policy optimization
Kemal Kirtac | Guido Germano
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Kemal Kirtac | Guido Germano
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
Reinforcement learning (RL) offers adaptive solutions to portfolio optimization, yet standard methods such as proximal policy optimization (PPO) rely exclusively on historical price data and overlook the impact of investor sentiment. We introduce sentiment-augmented PPO (SAPPO), a reinforcement learning framework that incorporates real-time sentiment signals extracted from Refinitiv financial news. Daily sentiment scores are generated using LLaMA 3.3. SAPPO integrates these signals into the PPO advantage function via a sentiment-weighted term, enabling allocation strategies that respond to both price movements and market sentiment. Experiments on a three-asset portfolio demonstrate that SAPPO increases the Sharpe ratio from 1.55 to 1.90 and reduces drawdowns relative to PPO. The optimal configuration uses a sentiment influence parameter 𝜆 = 0.1, as validated through ablation studies and statistically significant t-tests (p < 0.001). These findings show that sentiment-aware reinforcement learning improves trading performance and offers a robust alternative to purely price-based strategies.
2024
Enhanced Financial Sentiment Analysis and Trading Strategy Development Using Large Language Models
Kemal Kirtac | Guido Germano
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
Kemal Kirtac | Guido Germano
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
This study examines a novel methodology for enhanced financial sentiment analysis and trading strategy development using large language models (LLMs) such as OPT, BERT, FinBERT, LLAMA 3, and RoBERTa. Utilizing a dataset of 965,375 U.S. financial news articles from 2010 to 2023, our research demonstrates that the GPT-3-based OPT significantly outperforms other models, achieving a prediction accuracy of 74.4% for stock market returns. Our findings reveal that the advanced capabilities of LLMs, particularly OPT, surpass traditional sentiment analysis methods such as the Loughran-McDonald dictionary model in predicting and explaining stock returns. For instance, a self-financing strategy based on OPT scores achieves a Sharpe ratio of 3.05 over our sample period, compared to a Sharpe ratio of 1.23 for the strategy based on the dictionary model. This study highlights the superior performance of LLMs in financial sentiment analysis, encouraging further research into integrating artificial intelligence and LLMs in financial markets.