Xiaoyu Wang
Papers on this page may belong to the following people: Xiaoyu Wang, Xiaoyu Wang, Xiaoyu Wang
2026
CuMA: Aligning LLMs with Sparse Cultural Values via Demographic-Aware Mixture of Adapters
Ao Sun | Xiaoyu Wang | Zhe Tan | Yu Li | Zhu Jiachen | Yuheng Jia | Shu Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ao Sun | Xiaoyu Wang | Zhe Tan | Yu Li | Zhu Jiachen | Yuheng Jia | Shu Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from Mean Collapse, converging to a generic average that fails to represent diverse groups. We attribute this to Cultural Sparsity, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose CuMA (Cultural Mixture of Adapters), a framework that frames alignment as a conditional capacity separation problem. By incorporating demographic-aware routing, CuMA internalizes a Latent Cultural Topology to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that CuMA achieves competitive performance, outperforming both dense baselines and semantic-only MoEs. Our analysis confirms that CuMA effectively mitigates mean collapse and preserves cultural diversity. Our code is available at https://github.com/Throll/CuMA.
2025
DenseLoRA: Dense Low-Rank Adaptation of Large Language Models
Lin Mu | Xiaoyu Wang | Li Ni | Yang Li | Zhize Wu | Peiquan Jin | Yiwen Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lin Mu | Xiaoyu Wang | Li Ni | Yang Li | Zhize Wu | Peiquan Jin | Yiwen Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Low-rank adaptation (LoRA) has been developed as an efficient approach for adapting large language models (LLMs) by fine-tuning two low-rank matrices, thereby reducing the number of trainable parameters. However, prior research indicates that many of the weights in these matrices are redundant, leading to inefficiencies in parameter utilization. To address this limitation, we introduce Dense Low-Rank Adaptation (DenseLoRA), a novel approach that enhances parameter efficiency while achieving superior performance compared to LoRA. DenseLoRA builds upon the concept of representation fine-tuning, incorporating a single Encoder-Decoder to refine and compress hidden representations across all adaptation layers before applying adaptation. Instead of relying on two redundant low-rank matrices as in LoRA, DenseLoRA adapts LLMs through a dense low-rank matrix, improving parameter utilization and adaptation efficiency. We evaluate DenseLoRA on various benchmarks, showing that it achieves 83.8% accuracy with only 0.01% of trainable parameters, compared to LoRA’s 80.8% accuracy with 0.70% of trainable parameters on LLaMA3-8B. Additionally, we conduct extensive experiments to systematically assess the impact of DenseLoRA’s components on overall model performance.
2022
Commonsense Knowledge Salience Evaluation with a Benchmark Dataset in E-commerce
Yincen Qu | Ningyu Zhang | Hui Chen | Zelin Dai | Chengming Wang | Xiaoyu Wang | Qiang Chen | Huajun Chen
Findings of the Association for Computational Linguistics: EMNLP 2022
Yincen Qu | Ningyu Zhang | Hui Chen | Zelin Dai | Chengming Wang | Xiaoyu Wang | Qiang Chen | Huajun Chen
Findings of the Association for Computational Linguistics: EMNLP 2022
In e-commerce, the salience of commonsense knowledge (CSK) is beneficial for widespread applications such as product search and recommendation. For example, when users search for “running” in e-commerce, they would like to find products highly related to running, such as “running shoes” rather than “shoes”. Nevertheless, many existing CSK collections rank statements solely by confidence scores, and there is no information about which ones are salient from a human perspective. In this work, we define the task of supervised salience evaluation, where given a CSK triple, the model is required to learn whether the triple is salient or not. In addition to formulating the new task, we also release a new Benchmark dataset of Salience Evaluation in E-commerce (BSEE) and hope to promote related research on commonsense knowledge salience evaluation. We conduct experiments in the dataset with several representative baseline models. The experimental results show that salience evaluation is a hard task where models perform poorly on our evaluation set. We further propose a simple but effective approach, PMI-tuning, which shows promise for solving this novel problem. Code is available in https://github.com/OpenBGBenchmark/OpenBG-CSK.