When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Zhengzhe Yang

When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Abstract

Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents? We build a modular pipeline where a frozen LLM serves as a stateless feature extractor, transforming unstructured daily news and filings into a fixed-dimensional vector consumed by a downstream PPO agent. We introduce an automated prompt-optimization loop that treats the extraction prompt as a discrete hyperparameter and tunes it directly against the Information Coefficient—the Spearman rank correlation between predicted and realized returns—rather than NLP losses. The optimized prompt discovers genuinely predictive features (IC above ∼0.15 on held-out data). However, these valid intermediate representations do not automatically translate into downstream task performance: during a distribution shift caused by a macroeconomic shock, LLM-derived features add noise, and the augmented agent under-performs a price-only baseline. In a calmer test regime the agent recovers, yet macroeconomic state variables remain the most robust driver of policy improvement. Our findings highlight a gap between feature-level validity and policy-level robustness that parallels known challenges in transfer learning under distribution shift.

Anthology ID:: 2026.customnlp4u-1.17
Volume:: Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Sheshera Mysore, Sachin Kumar, Vidhisha Balachandran, Shirley Anugrah Hayati, Faeze Brahman, Hanane Nour Moussa, Alireza Salemi
Venues:: CustomNLP4U | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 182–190
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.customnlp4u-1.17/
DOI:
Bibkey:
Cite (ACL):: Zhengzhe Yang. 2026. When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies. In Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 182–190, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies (Yang, CustomNLP4U 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.customnlp4u-1.17.pdf

PDF Cite Search Fix data