Le Qiu
2026
Large Language Models Put to the Test on Chinese Noun Compounds: Experiments on Natural Language Inference and Compound Semantics
Le Qiu | Emmanuele Chersoni | He Zhou | Yu-Yin Hsu
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Le Qiu | Emmanuele Chersoni | He Zhou | Yu-Yin Hsu
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Noun compounds are generally considered an open challenge for NLP systems, given to the difficulty of interpreting the implicit semantic relation between modifier and head, although the advent of Large Language Models (LLMs) recently led to remarkable performance leaps. However, most evaluations have been carried out on English benchmarks.In our work, we test LLMs on compound semantics understanding in Chinese, adopting two different evaluation scenarios: an extrinsic evaluation in a Natural Language Inference task, and an intrinsic evaluation in which models are directly asked to predict the semantic relation linking the two constituents.Our results show that the bigger and more recent LLMs are able to surpass supervised baselines in the inference task, especially when tested under the few-shot setting. In the more challenging task of selecting the correct interpretation of the compounds out of a fine-grained typology of semantic relations between head and modifier, the best Chinese LLM (Qwen-plus) manages to select the correct option in about one third of the cases.
LST at MWE-2026 AdMIRe 2: Advancing Multimodal Idiomaticity Representation
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper presents our methods for the AdMIRe 2.0 shared task, which addresses multilingual and multimodal idiom understanding. Our submission focuses on the text-only track. Specifically, we employ an ensemble of three large language models (LLMs) to directly perform the presented image ranking task. Each model independently produces a ranking of the candidate images, and we aggregate their outputs using a hard voting strategy to determine the final prediction. This ensemble learning framework leverages the complementary strengths of different LLMs, improving robustness and reducing the variance of individual model predictions.
2025
StockGenChaR: A Study on the Evaluation of Large Vision-Language Models on Stock Chart Captioning
Le Qiu | Emmanuele Chersoni
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing
Le Qiu | Emmanuele Chersoni
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing
ChengyuSTS: An Intrinsic Perspective on Mandarin Idiom Representation
Le Qiu | Emmanuele Chersoni | Aline Villavicencio
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Le Qiu | Emmanuele Chersoni | Aline Villavicencio
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Chengyu, or four-character idioms, are ubiquitous in both spoken and written Chinese. Despite their importance, chengyu are often underexplored in NLP tasks, and existing evaluation frameworks are primarily based on extrinsic measures. In this paper, we introduce an intrinsic evaluation task for Chinese idiomatic understanding: idiomatic semantic textual similarity (iSTS), which evaluates how well models can capture the semantic similarity of sentences containing idioms. To this purpose, we present a curated dataset: ChengyuSTS. Our experiments show that current pre-trained sentence Transformer models generally fail to capture the idiomaticity of chengyu in a zero-shot setting. We then show results of fine-tuned models using the SimCSE contrastive learning framework, which demonstrate promising results for handling idiomatic expressions. We also presented the results of DeepSeek for reference
2024
Probing Numerical Concepts in Financial Text with BERT Models
Shanyue Guo | Le Qiu | Emmanuele Chersoni
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning
Shanyue Guo | Le Qiu | Emmanuele Chersoni
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning
CompLex-ZH: A New Dataset for Lexical Complexity Prediction in Mandarin and Cantonese
Le Qiu | Shanyue Guo | Tak-Sum Wong | Emmanuele Chersoni | John Lee | Chu-Ren Huang
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
Le Qiu | Shanyue Guo | Tak-Sum Wong | Emmanuele Chersoni | John Lee | Chu-Ren Huang
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
The prediction of lexical complexity in context is assuming an increasing relevance in Natural Language Processing research, since identifying complex words is often the first step of text simplification pipelines. To the best of our knowledge, though, datasets annotated with complex words are available only for English and for a limited number of Western languages.In our paper, we introduce CompLex-ZH, a dataset including words annotated with complexity scores in sentential contexts for Chinese. Our data include sentences in Mandarin and Cantonese, which were selected from a variety of sources and textual genres. We provide a first evaluation with baselines combining hand-crafted and language models-based features.
2023
Identifying ESG Impact with Key Information
Le Qiu | Bo Peng | Jinghang Gu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing
Le Qiu | Bo Peng | Jinghang Gu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing
The paper presents a concise summary of our work for the ML-ESG-2 shared task, exclusively on the Chinese and English datasets. ML-ESG-2 aims to ascertain the influence of news articles on corporations, specifically from an ESG perspective. To this end, we generally explored the capability of key information for impact identification and experimented with various techniques at different levels. For instance, we attempted to incorporate important information at the word level with TF-IDF, at the sentence level with TextRank, and at the document level with summarization. The final results reveal that the one with GPT-4 for summarisation yields the best predictions.
Collecting and Predicting Neurocognitive Norms for Mandarin Chinese
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 15th International Conference on Computational Semantics
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 15th International Conference on Computational Semantics
Language researchers have long assumed that concepts can be represented by sets of semantic features, and have traditionally encountered challenges in identifying a feature set that could be sufficiently general to describe the human conceptual experience in its entirety. In the dataset of English norms presented by Binder et al. (2016), also known as Binder norms, the authors introduced a new set of neurobiologically motivated semantic features in which conceptual primitives were defined in terms of modalities of neural information processing. However, no comparable norms are currently available for other languages. In our work, we built the Mandarin Chinese norm by translating the stimuli used in the original study and developed a comparable collection of human ratings for Mandarin Chinese. We also conducted some experiments on the automatic prediction of the Chinese Binder Norms based on the word embeddings of the corresponding words to assess the feasibility of modeling experiential semantic features via corpus-based representations.