Abstract
Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. Existing works primarily focus on the inconsistency issues within a single LLM, while we complementarily explore the inter-consistency among multiple LLMs for collaboration. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning, and introduce a formal debate framework (FORD) to conduct a three-stage debate among LLMs with real-world scenarios alignment: fair debate, mismatched debate, and roundtable debate. Through extensive experiments on various datasets, LLMs can effectively collaborate to reach a consensus despite noticeable inter-inconsistencies, but imbalances in their abilities can lead to domination by superior LLMs. Leveraging a more advanced LLM like GPT-4 as an authoritative judge can boost collaboration performance. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods. Codes and data are available at https://github.com/Waste-Wood/FORD.- Anthology ID:
- 2023.findings-emnlp.508
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7572–7590
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.508
- DOI:
- 10.18653/v1/2023.findings-emnlp.508
- Cite (ACL):
- Kai Xiong, Xiao Ding, Yixin Cao, Ting Liu, and Bing Qin. 2023. Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7572–7590, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.findings-emnlp.508.pdf