Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision

Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian McAuley, Peter Clark, Bodhisattwa Prasad Majumder


Abstract
Instruction-following LLMs have recently allowed systems to discover hidden concepts from a collection of unstructured documents based on a natural language description of the purpose of the discovery (i.e., goal). Still, the quality of the discovered concepts remains mixed, as it depends heavily on LLM’s reasoning ability and drops when the data is noisy or beyond LLM’s knowledge. We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM’s instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short. Instruct-LF uses LLMs to propose fine-grained, goal-related properties from documents, estimates their presence across the dataset, and applies gradient-based optimization to uncover hidden factors, where each factor is represented by a cluster of co-occurring properties. We evaluate latent factors produced by Instruct-LF on movie recommendation, text-world navigation, and legal document categorization tasks. These interpretable representations improve downstream task performance by 5-52% than the best baselines and were preferred 1.8 times as often as the best alternative, on average, in human evaluation.
Anthology ID:
2025.naacl-long.554
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11114–11134
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.naacl-long.554/
DOI:
Bibkey:
Cite (ACL):
Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian McAuley, Peter Clark, and Bodhisattwa Prasad Majumder. 2025. Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11114–11134, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision (Xie et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.naacl-long.554.pdf