Arvid Frydenlund


2024

pdf
On the Pathological Path-star Task for Language Models (Extended Abstract)
Arvid Frydenlund
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

The recently introduced path-star task is a minimal toy task designed to exemplify limitations to the abilities of language models (Bachmann and Nagarajan, 2024). It involves a path-star graph where multiple arms radiate from a single starting node and each node is unique. Then, given the start node and a specified target node which ends one of the arms, the task is to generate the arm containing that target node. This is straightforward for a human but surprisingly difficult for a language model, which they found failed to predict above chance.They hypothesized this is due to a deficiency in teacher-forcing and next-token prediction paradigm. In this extended abstract, we demonstrate that the task is learnable using teacher-forcing in alternative settings and that the issue is (partially) due to representation. We analyze situations when the models fail to solve the task which leads us to introduce a regularization technique where we pack each training batch with multiple instances of the same graph but with differing target nodes to prevent overfitting. Initial results indicate this helps in solving the task.
Search
Co-authors
    Venues