The Mystery of the Pathological Path-star Task for Language Models

Arvid Frydenlund

doi:10.18653/v1/2024.emnlp-main.695

The Mystery of the Pathological Path-star Task for Language Models

Abstract

The recently introduced path-star task is a minimal task designed to exemplify limitations to the abilities of language models (Bachmann and Nagarajan, 2024). It involves a path-star graph where multiple arms radiate from a single starting node and each node is unique. Given the start node and a specified target node that ends an arm, the task is to generate the arm containing that target node. This is straightforward for a human but surprisingly difficult for language models, which did not outperform the random baseline. The authors hypothesized this is due to a deficiency in teacher-forcing and the next-token prediction paradigm. We demonstrate the task is learnable using teacher-forcing in alternative settings and that the issue is partially due to representation. We introduce a regularization method using structured samples of the same graph but with differing target nodes, improving results across a variety of model types. We provide RASP proofs showing the task is theoretically solvable. Finally, we find settings where an encoder-only model can consistently solve the task.

Anthology ID:: 2024.emnlp-main.695
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12493–12516
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.695/
DOI:: 10.18653/v1/2024.emnlp-main.695
Bibkey:
Cite (ACL):: Arvid Frydenlund. 2024. The Mystery of the Pathological Path-star Task for Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 12493–12516, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: The Mystery of the Pathological Path-star Task for Language Models (Frydenlund, EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.695.pdf
Data:: 2024.emnlp-main.695.data.zip

PDF Cite Search Data Fix data