Yongjian Chen
2026
KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models
Xiao Zhang | Qianru Meng | Yongjian Chen | Yumeng Wang | Johan Bos
Findings of the Association for Computational Linguistics: ACL 2026
Xiao Zhang | Qianru Meng | Yongjian Chen | Yumeng Wang | Johan Bos
Findings of the Association for Computational Linguistics: ACL 2026
Many real-world questions appear deceptively simple yet implicitly demand two capabilities: (i) systematic coverage of a bounded knowledge universe and (ii) compositional set-based reasoning over that universe, a phenomenon we term “the tip of the iceberg.” We formalize this challenge through two orthogonal dimensions: knowledge width, the cardinality of the required universe, and reasoning depth, the number of compositional set operations. We introduce KnowledgeBerg, a benchmark of 4,800 multiple-choice questions derived from 1,183 enumeration seeds spanning 10 domains and 17 languages, with universes grounded in authoritative sources to ensure reproducibility. Representative open-source LLMs demonstrate severe limitations, achieving only 5.26–36.88 F1 on universe enumeration and 16.00–44.19 accuracy on knowledge-grounded reasoning. Diagnostic analyses reveal three stages of failure: completeness, or missing knowledge; awareness, or failure to identify requirements; and application, or incorrect reasoning execution. This pattern persists across languages and model scales. Although test-time compute and retrieval augmentation yield measurable gains—up to 4.35 and 3.78 points, respectively—substantial gaps remain, exposing limitations in how current LLMs organize structured knowledge and execute compositional reasoning over bounded domains. The dataset is available at https://huggingface.co/datasets/2npc/KnowledgeBerg
2025
From Shortcuts to Balance: Attribution Analysis of Speech-Text Feature Utilization in Distinguishing Original from Machine-Translated Texts
Yongjian Chen | Antonio Toral
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yongjian Chen | Antonio Toral
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Neural text-based models for detecting machine-translated texts can rely on named entities (NEs) as classification shortcuts. While masking NEs encourages learning genuine translationese signals, it degrades the classification performance. Incorporating speech features compensates for this loss, but their interaction with NE reliance requires careful investigation. Through systematic attribution analysis across modalities, we find that bimodal integration leads to more balanced feature utilization, reducing the reliance on NEs in text while moderating overemphasis attribution patterns in speech features.
2024
Your Stereotypical Mileage May Vary: Practical Challenges of Evaluating Biases in Multiple Languages and Cultural Contexts
Karen Fort | Laura Alonso Alemany | Luciana Benotti | Julien Bezançon | Claudia Borg | Marthese Borg | Yongjian Chen | Fanny Ducel | Yoann Dupont | Guido Ivetta | Zhijian Li | Margot Mieskes | Marco Naguib | Yuyan Qian | Matteo Radaelli | Wolfgang S. Schmeisser-Nieto | Emma Raimundo Schulz | Thiziri Saci | Sarah Saidi | Javier Torroba Marchante | Shilin Xie | Sergio E. Zanotto | Aurélie Névéol
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Karen Fort | Laura Alonso Alemany | Luciana Benotti | Julien Bezançon | Claudia Borg | Marthese Borg | Yongjian Chen | Fanny Ducel | Yoann Dupont | Guido Ivetta | Zhijian Li | Margot Mieskes | Marco Naguib | Yuyan Qian | Matteo Radaelli | Wolfgang S. Schmeisser-Nieto | Emma Raimundo Schulz | Thiziri Saci | Sarah Saidi | Javier Torroba Marchante | Shilin Xie | Sergio E. Zanotto | Aurélie Névéol
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting The study of bias, fairness and social impact in Natural Language Processing (NLP) lacks resources in languages other than English. Our objective is to support the evaluation of bias in language models in a multilingual setting. We use stereotypes across nine types of biases to build a corpus containing contrasting sentence pairs, one sentence that presents a stereotype concerning an underadvantaged group and another minimally changed sentence, concerning a matching advantaged group. We build on the French CrowS-Pairs corpus and guidelines to provide translations of the existing material into seven additional languages. In total, we produce 11,139 new sentence pairs that cover stereotypes dealing with nine types of biases in seven cultural contexts. We use the final resource for the evaluation of relevant monolingual and multilingual masked language models. We find that language models in all languages favor sentences that express stereotypes in most bias categories. The process of creating a resource that covers a wide range of language types and cultural settings highlights the difficulty of bias evaluation, in particular comparability across languages and contexts.
Improving NMT from a Low-Resource Source Language: A Use Case from Catalan to Chinese via Spanish
Yongjian Chen | Antonio Toral | Zhijian Li | Mireia Farrús
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Yongjian Chen | Antonio Toral | Zhijian Li | Mireia Farrús
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
The effectiveness of neural machine translation is markedly constrained in low-resource scenarios, where the scarcity of parallel data hampers the development of robust models. This paper focuses on the scenario where the source language is low-resourceand there exists a related high-resource language, for which we introduce a novel approach that combines pivot translation and multilingual training. As a use case we tackle the automatic translation from Catalan to Chinese, using Spanish as an additional language. Our evaluation, conducted on the FLORES-200 benchmark, compares our new approach against a vanilla baseline alongside other models representing various low-resource techniques in the Catalan-to-Chinese context. Experimental results highlight the efficacy of our proposed method, which outperforms existing models, notably demonstrating significant improvements both in translation quality and in lexical diversity.
Search
Fix author
Co-authors
- Zhijian Li 2
- Antonio Toral 2
- Laura Alonso Alemany 1
- Luciana Benotti 1
- Julien Bezançon 1
- Claudia Borg 1
- Marthese Borg 1
- Johan Bos 1
- Fanny Ducel 1
- Yoann Dupont 1
- Mireia Farrús 1
- Karën Fort 1
- Guido Ivetta 1
- Qianru Meng 1
- Margot Mieskes 1
- Marco Naguib 1
- Aurelie Neveol 1
- Yuyan Qian 1
- Matteo Radaelli 1
- Emma Raimundo Schulz 1
- Thiziri Saci 1
- Sarah Saidi 1
- Wolfgang S. Schmeisser-Nieto 1
- Javier Torroba Marchante 1
- Yumeng Wang 1
- Shilin Xie 1
- Sergio E. Zanotto 1
- Xiao Zhang 1