@inproceedings{bellos-etal-2024-large,
    title = "Can Large Language Models Reason About Goal-Oriented Tasks?",
    author = "Bellos, Filippos  and
      Li, Yayuan  and
      Liu, Wuao  and
      Corso, Jason",
    editor = "Miceli-Barone, Antonio Valerio  and
      Barez, Fazl  and
      Cohen, Shay  and
      Voita, Elena  and
      Germann, Ulrich  and
      Lukasik, Michal",
    booktitle = "Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.scalellm-1.3/",
    pages = "24--34",
    abstract = "Most adults can complete a sequence of steps to achieve a certain goal, such as making a sandwich or repairing a bicycle tire. In completing these goal-oriented tasks, or simply tasks in this paper, one must use sequential reasoning to understand the relationship between the sequence of steps and the goal. LLMs have shown impressive capabilities across various natural language understanding tasks. However, prior work has mainlyfocused on logical reasoning tasks (e.g. arithmetic, commonsense QA); how well LLMs can perform on more complex reasoning tasks like sequential reasoning is not clear. In this paper, we address this gap and conduct a comprehensive evaluation of how well LLMs are able to conduct this reasoning for tasks and how they scale w.r.t multiple dimensions(e.g. adaptive prompting strategies, number of in-context examples, varying complexity of the sequential task). Our findings reveal that while Chain of Thought (CoT) prompting can significantly enhance LLMs' sequential reasoning in certain scenarios, it can also be detrimental in others, whereas Tree of Thoughts (ToT) reasoning is less effective for this type of task. Additionally, we discover that an increase in model size or in-context examples does not consistently lead to improved performance."
}Markdown (Informal)
[Can Large Language Models Reason About Goal-Oriented Tasks?](https://preview.aclanthology.org/ingest-emnlp/2024.scalellm-1.3/) (Bellos et al., SCALE-LLM 2024)
ACL
- Filippos Bellos, Yayuan Li, Wuao Liu, and Jason Corso. 2024. Can Large Language Models Reason About Goal-Oriented Tasks?. In Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024), pages 24–34, St. Julian’s, Malta. Association for Computational Linguistics.