Cheng Yu
2026
E-ABSA20K: A Dataset and Propose-and-Verify for Aspect-Based Sentiment Analysis in Long E-commerce Reviews
Tong Sun | Mingyang Ma | Cheng Yu
Findings of the Association for Computational Linguistics: ACL 2026
Tong Sun | Mingyang Ma | Cheng Yu
Findings of the Association for Computational Linguistics: ACL 2026
Aspect-Based Sentiment Analysis (ABSA) is critical for extracting actionable product insights from e-commerce reviews. However, most public ABSA benchmarks are restricted to short texts and a limited range of domains, and therefore underrepresent the challenges posed by real-world reviews—where multiple aspects co-occur, colloquial and noisy expressions are common, and evidence must often be aggregated across sentences in long contexts.We introduce E-ABSA20K, a multi-domain dataset of 20K reviews from four product categories (Women’s Bags, Dresses, Cosmetics, and Furniture), annotated with review-level sentiment quads. Compared to existing benchmarks, E-ABSA20K contains substantially longer and more aspect-dense reviews, averaging 63.9 words and 6.0 quads per review. We further propose a two-stage propose-and-verify framework for review-level quadruple extraction (target, aspect, opinion, sentiment). The first stage generates high-recall candidates under strict schema constraints, while the second stage conducts explicit grounding, scope, and modality verification, followed by review-level consolidation to mitigate hallucinations and scope leakage in long reviews. Experiments across multiple Qwen3 model sizes demonstrate that our approach consistently outperforms single-stage prompting (with and without chain-of-thought) as well as competitive ABSA extraction baselines, improving quad-level micro-F1 and robustness on discourse-hard cases such as comparisons and conditionals.
AutoPKG: An Automated Framework for Dynamic E-commerce Product-Attribute Knowledge Graph Construction
Pollawat Hongwimol | Haoning Shang | Chutong Wang | Zhichao Wan | Yi Gao | Yuanming Li | Lin Gui | Wenhao Sun | Cheng Yu
Findings of the Association for Computational Linguistics: ACL 2026
Pollawat Hongwimol | Haoning Shang | Chutong Wang | Zhichao Wan | Yi Gao | Yuanming Li | Lin Gui | Wenhao Sun | Cheng Yu
Findings of the Association for Computational Linguistics: ACL 2026
Product attribute extraction in e-commerce is bottlenecked by ontologies that are inconsistent, incomplete, and costly to maintain. We present AutoPKG, a multi-agent Large Language Model (LLM) framework that automatically constructs a Product-attribute Knowledge Graph (PKG) from multimodal product content. AutoPKG induces product types and type-specific attribute keys on demand, extracts attribute values from text and images, and consolidates updates through a centralized decision agent that maintains a globally consistent canonical graph. We also propose an evaluation protocol for dynamic PKGs that measures type/key validity and consolidation quality, as well as edge-level accuracy for value assertions after canonicalization. On a large real-world marketplace catalog dataset from Lazada (Alibaba), AutoPKG achieves up to 0.953 Weighted Knowledge Efficiency (WKE) for product types, 0.724 WKE for attribute keys, and 0.531 edge-level F1 for multimodal value extraction. Across three public benchmarks, we improve edge-level exact-match F1 by 0.152 and yield a 0.208 precision gain on the attribute extraction application. Online A/B tests show that AutoPKG-derived attributes increase Gross Merchandise Value (GMV) in Badge (+3.81%), Search (+5.32%), and Recommendation (+7.89%), supporting AutoPKG’s practical value in production.
Unified Thinker: A General Reasoning Core for Image Generation
Sashuai Zhou | Qiang Zhou | Jijin Hu | Hanqing Yang | Yue Cao | Junpeng Ma | Yinchao Ma | Jun Song | Tiezheng Ge | Cheng Yu | Bo Zheng | Zhou Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sashuai Zhou | Qiang Zhou | Jijin Hu | Hanqing Yang | Yue Cao | Junpeng Ma | Yinchao Ma | Jun Song | Tiezheng Ge | Cheng Yu | Bo Zheng | Zhou Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning–execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.
2025
Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
Juncheng Wang | Chao Xu | Cheng Yu | Zhe Hu | Haoyu Xie | Guoqi Yu | Lei Shang | Shujun Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Juncheng Wang | Chao Xu | Cheng Yu | Zhe Hu | Haoyu Xie | Guoqi Yu | Lei Shang | Shujun Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
While language models (LMs) paired with residual vector quantization (RVQ) tokenizers have shown promise in text-to-audio (T2A) generation, they still lag behind diffusion-based models by a non-trivial margin. We identify a critical dilemma underpinning this gap: incorporating more RVQ layers improves audio reconstruction fidelity but exceeds the generation capacity of conventional LMs. To address this, we first analyze RVQ dynamics and uncover two key limitations: 1) orthogonality of features across RVQ layers hinders effective LMs training, and 2) descending semantic richness in tokens from deeper RVQ layers exacerbates exposure bias during autoregressive decoding. Based on these insights, we propose Siren, a novel LM-based framework that employs multiple isolated transformers with causal conditioning and anti-causal alignment via reinforcement learning. Extensive experiments demonstrate that Siren outperforms both existing LM-based and diffusion-based T2A systems, achieving state-of-the-art results. By bridging the representational strengths of LMs with the fidelity demands of audio synthesis, our approach repositions LMs as competitive contenders against diffusion models in T2A tasks. Moreover, by aligning audio representations with linguistic structures, Siren opens a promising pathway toward unified multi-modal generation frameworks.
2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Yang Li | Cheng Yu | Guangzhi Sun | Hua Jiang | Fanglei Sun | Weiqin Zu | Ying Wen | Yang Yang | Jun Wang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yang Li | Cheng Yu | Guangzhi Sun | Hua Jiang | Fanglei Sun | Weiqin Zu | Ying Wen | Yang Yang | Jun Wang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins.
2016
Pairwise FastText Classifier for Entity Disambiguation
Cheng Yu | Bing Chu | Rohit Ram | James Aichinger | Lizhen Qu | Hanna Suominen
Proceedings of the Australasian Language Technology Association Workshop 2016
Cheng Yu | Bing Chu | Rohit Ram | James Aichinger | Lizhen Qu | Hanna Suominen
Proceedings of the Australasian Language Technology Association Workshop 2016
2014
Search
Fix author
Co-authors
- James Aichinger 1
- Yue Cao 1
- Bing Chu 1
- Yi Gao 1
- Tiezheng Ge 1
- Lin Gui 1
- Pollawat Hongwimol 1
- Jijin Hu 1
- Zhe Hu 1
- Hua Jiang 1
- Yang Li 1
- Yuanming Li 1
- Chunhua Liu 1
- Junpeng Ma 1
- Mingyang Ma 1
- Yinchao Ma 1
- Lizhen Qu 1
- Qin Qu 1
- Rohit Ram 1
- Haoning Shang 1
- Lei Shang 1
- Jun Song 1
- Fanglei Sun 1
- Guangzhi Sun 1
- Tong Sun 1
- Wenhao Sun 1
- Hanna Suominen 1
- Gongbo Tang 1
- Yue Tian 1
- Zhichao Wan 1
- Chutong Wang 1
- Jun Wang 1
- Juncheng Wang 1
- Shujun Wang 1
- Ying Wen 1
- Haoyu Xie 1
- Chao Xu 1
- Hanqing Yang 1
- Yang Yang 1
- Jing Yi 1
- Dong Yu (于东) 1
- Guoqi Yu 1
- Zhou Zhao 1
- Bo Zheng 1
- Qiang Zhou (周强) 1
- Sashuai Zhou 1
- Weiqin Zu 1