This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
HirokiNomoto
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Multilingual large language models (LLMs) have gained prominence, but concerns arise regarding their reliability beyond English. This study addresses the gap in cross-lingual semantic evaluation by introducing a novel benchmark for cross-lingual sense disambiguation, StingrayBench. In this paper, we demonstrate using false friends—words that are orthographically similar but have completely different meanings in two languages— as a possible approach to pinpoint the limitation of cross-lingual sense disambiguation in LLMs. We collect false friends in four language pairs, namely Indonesian-Malay, Indonesian-Tagalog, Chinese-Japanese, and English-German; and challenge LLMs to distinguish the use of them in context. In our analysis of various models, we observe they tend to be biased toward higher-resource languages. We also propose new metrics for quantifying the cross-lingual sense bias and comprehension based on our benchmark. Our work contributes to developing more diverse and inclusive language modeling, promoting fairer access for the wider multilingual community.
We describe the linking of the TUFS Basic Vocabulary Modules, created for online language learning, with the Open Multilingual Wordnet. The TUFS modules have roughly 500 lexical entries in 30 languages, each with the lemma, a link across the languages, an example sentence, usage notes and sound files. The Open Multilingual Wordnet has 34 languages (11 shared with TUFS) organized into synsets linked by semantic relations, with examples and definitions for some languages. The links can be used to (i) evaluate existing wordnets, (ii) add data to these wordnets and (iii) create new open wordnets for Khmer, Korean, Lao, Mongolian, Russian, Tagalog, Urdua nd Vietnamese