FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean

Seoyoon Park; Hyeji Choi; Minseon Kim; Subin An; Xiaonan Wang; Gyuri Choi; Hansaem Kim

FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean

Seoyoon Park, Hyeji Choi, Minseon Kim, Subin An, Xiaonan Wang, Gyuri Choi, Hansaem Kim

Abstract

Figurative language conveys stance, emotion, and social nuance, making its appropriate use essential in dialogue. While large language models (LLMs) often succeed in recognizing figurative expressions at the sentence level, their ability to use them coherently in conversation remains uncertain. We introduce FLUID QA, the first multilingual benchmark that evaluates figurative usage in dialogue across English, Korean, and Chinese. Each item embeds figurative choices into multi-turn contexts. To support interpretation, we include FLUTE-bi, a sentence-level diagnostic task. Results reveal a persistent gap: models that perform well on FLUTE-bi frequently fail on FLUID QA, especially in sarcasm and metaphor. These errors reflect systematic rhetorical confusion and limited discourse reasoning. FLUID QA provides a scalable framework for assessing usage-level figurative competence across languages.

Anthology ID:: 2025.emnlp-main.1540
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30268–30282
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1540/
DOI:
Bibkey:
Cite (ACL):: Seoyoon Park, Hyeji Choi, Minseon Kim, Subin An, Xiaonan Wang, Gyuri Choi, and Hansaem Kim. 2025. FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30268–30282, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean (Park et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1540.pdf
Checklist:: 2025.emnlp-main.1540.checklist.pdf

PDF Cite Search Checklist Fix data