fLSA: Learning Semantic Structures in Document Collections Using Foundation Models

Weijia Xu, Nebojsa Jojic, Nicolas Le Roux


Abstract
Humans can learn to solve new tasks by inducing high-level strategies from example solutions to similar problems and then adapting these strategies to solve unseen problems. Can we use large language models to induce such high-level structure from example documents or solutions? We introduce fLSA, a foundation-model-based Latent Semantic Analysis method that iteratively clusters and tags document segments based on document-level contexts. These tags can be used to model the latent structure of given documents and for hierarchical sampling of new texts. Our experiments on story writing, math, and multi-step reasoning datasets demonstrate that fLSA tags are more informative in reconstructing the original texts than existing tagging methods. Moreover, when used for hierarchical sampling, fLSA tags help expand the output space in the right directions that lead to correct solutions more often than direct sampling and hierarchical sampling with existing tagging methods.
Anthology ID:
2025.emnlp-main.1290
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25395–25406
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1290/
DOI:
Bibkey:
Cite (ACL):
Weijia Xu, Nebojsa Jojic, and Nicolas Le Roux. 2025. fLSA: Learning Semantic Structures in Document Collections Using Foundation Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25395–25406, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
fLSA: Learning Semantic Structures in Document Collections Using Foundation Models (Xu et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1290.pdf
Checklist:
 2025.emnlp-main.1290.checklist.pdf