Cong Chen
Also published as: 聪 陈
2026
Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions
Guo Gan | Yuxuan Ding | Cong Chen | Yuwei Ren | Yin Huang | Hong Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guo Gan | Yuxuan Ding | Cong Chen | Yuwei Ren | Yin Huang | Hong Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully exploring each costly emulator state. In this paper, we propose Android Coach, a novel framework that shifts the training paradigm to Single State Multiple Actions, allowing the agent to sample and utilize multiple actions for a single online state. We enable this without additional emulator overhead by learning a critic that estimates action values. To ensure the critic serves as a reliable coach, we integrate a process reward model and introduce a group-wise advantage estimator based on the averaged critic outputs. Extensive experiments demonstrate the effectiveness and efficiency of Android Coach: it achieves 7.5% and 8.3% success rate improvements on AndroidLab and AndroidWorld over UI-TARS-1.5-7B, and attains 1.4x higher training efficiency than Single State Single Action methods PPO and GRPO at matched success rates.
2024
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
Jiajia Li | Lu Yang | Mingni Tang | Cong Chen | Zuchao Li | Ping Wang | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2024
Jiajia Li | Lu Yang | Mingni Tang | Cong Chen | Zuchao Li | Ping Wang | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2024
Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs’ capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs.ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs’ performance in the domain of music.Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities.With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs’ music-related abilities. The dataset is available at GitHub and HuggingFace.
2022
MT-Speech at SemEval-2022 Task 10: Incorporating Data Augmentation and Auxiliary Task with Cross-Lingual Pretrained Language Model for Structured Sentiment Analysis
Cong Chen | Jiansong Chen | Cao Liu | Fan Yang | Guanglu Wan | Jinxiong Xia
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Cong Chen | Jiansong Chen | Cao Liu | Fan Yang | Guanglu Wan | Jinxiong Xia
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Sentiment analysis is a fundamental task, and structure sentiment analysis (SSA) is an important component of sentiment analysis. However, traditional SSA is suffering from some important issues: (1) lack of interactive knowledge of different languages; (2) small amount of annotation data or even no annotation data. To address the above problems, we incorporate data augment and auxiliary tasks within a cross-lingual pretrained language model into SSA. Specifically, we employ XLM-Roberta to enhance mutually interactive information when parallel data is available in the pretraining stage. Furthermore, we leverage two data augment strategies and auxiliary tasks to improve the performance on few-label data and zero-shot cross-lingual settings. Experiments demonstrate the effectiveness of our models. Our models rank first on the cross-lingual sub-task and rank second on the monolingual sub-task of SemEval-2022 task 10.
2020
“细粒度英汉机器翻译错误分析语料库”的构建与思考(Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation and Its Implications)
Bailian Qiu (裘白莲) | Mingwen Wang (王明文) | Maoxi Li (李茂西) | Cong Chen (陈聪) | Fan Xu (徐凡)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Bailian Qiu (裘白莲) | Mingwen Wang (王明文) | Maoxi Li (李茂西) | Cong Chen (陈聪) | Fan Xu (徐凡)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
机器翻译错误分析旨在找出机器译文中存在的错误,包括错误类型、错误分布等,它在机器翻译研究和应用中起着重要作用。该文将人工译后编辑与错误分析结合起来,对译后编辑操作进行错误标注,采用自动标注和人工标注相结合的方法,构建了一个细粒度英汉机器翻译错误分析语料库,其中每一个标注样本包括源语言句子、机器译文、人工参考译文、译后编辑译文、词错误率和错误类型标注;标注的错误类型包括增词、漏词、错词、词序错误、未译和命名实体翻译错误等。标注的一致性检验表明了标注的有效性;对标注语料的统计分析结果能有效地指导机器翻译系统的开发和人工译员的后编辑。