Neuro-Symbolic Agentic Reinforcement Learning for Long-Term Original Character Companionship and Interaction

Zhenhan Huang


Abstract
As human-agent interaction (HAI) evolves toward long-term social companionship, users expect *Original Character (OC)* agents to maintain a consistent persona, manage shared memories, and adapt to ever-changing preferences. However, LLM-based agents optimized by prompting or SFT exhibit a generalization gap: they behave as myopic instruction followers, leading to cascading errors in multi-turn interactions. For the agents to learn trajectory-level value functions that enable farsighted decision-making, we propose the NSARL framework, which formalizes OC companion agents’ interactions as a POMDP and decomposes the agent into three sub-policies (Router, Memory, and Persona), optimized via closed-loop RL from AI feedback (RLAIF) with verifiable rewards in a graph-constrained action space. Our preliminary experiments indicate a trade-off: SFT yields stronger persona generation, while NSARL improves structural logic, through conservative strategies (e.g., over-routing) that increase workflow completeness, advocating for a hybrid deployment strategy.
Anthology ID:
2026.acl-short.44
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
530–539
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-short.44/
DOI:
Bibkey:
Cite (ACL):
Zhenhan Huang. 2026. Neuro-Symbolic Agentic Reinforcement Learning for Long-Term Original Character Companionship and Interaction. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 530–539, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Neuro-Symbolic Agentic Reinforcement Learning for Long-Term Original Character Companionship and Interaction (Huang, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-short.44.pdf
Checklist:
 2026.acl-short.44.checklist.pdf