Dong Yang

Also published as: D. Yang


2023

pdf
CoMave: Contrastive Pre-training with Multi-scale Masking for Attribute Value Extraction
Xinnan Guo | Wentao Deng | Yongrui Chen | Yang Li | Mengdi Zhou | Guilin Qi | Tianxing Wu | Dong Yang | Liubin Wang | Yong Pan
Findings of the Association for Computational Linguistics: ACL 2023

Attribute Value Extraction (AVE) aims to automatically obtain attribute value pairs from product descriptions to aid e-commerce. Despite the progressive performance of existing approaches in e-commerce platforms, they still suffer from two challenges: 1) difficulty in identifying values at different scales simultaneously; 2) easy confusion by some highly similar fine-grained attributes. This paper proposes a pre-training technique for AVE to address these issues. In particular, we first improve the conventional token-level masking strategy, guiding the language model to understand multi-scale values by recovering spans at the phrase and sentence level. Second, we apply clustering to build a challenging negative set for each example and design a pre-training objective based on contrastive learning to force the model to discriminate similar attributes. Comprehensive experiments show that our solution provides a significant improvement over traditional pre-trained models in the AVE task, and achieves state-of-the-art on four benchmarks.

2022

pdf
GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs
Dong Yang | Peijun Qing | Yang Li | Haonan Lu | Xiaodong Lin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Embedding knowledge graphs (KGs) for multi-hop logical reasoning is a challenging problem due to massive and complicated structures in many KGs. Recently, many promising works projected entities and queries into a geometric space to efficiently find answers. However, it remains challenging to model the negation and union operator. The negation operator has no strict boundaries, which generates overlapped embeddings and leads to obtaining ambiguous answers. An additional limitation is that the union operator is non-closure, which undermines the model to handle a series of union operators. To address these problems, we propose a novel probabilistic embedding model, namely Gamma Embeddings (GammaE), for encoding entities and queries to answer different types of FOL queries on KGs. We utilize the linear property and strong boundary support of the Gamma distribution to capture more features of entities and queries, which dramatically reduces model uncertainty. Furthermore, GammaE implements the Gamma mixture method to design the closed union operator. The performance of GammaE is validated on three large logical query datasets. Experimental results show that GammaE significantly outperforms state-of-the-art models on public benchmarks.

2010

pdf
Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm
Dong Yang | Paul Dixon | Sadaoki Furui
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf
Automatic Chinese Abbreviation Generation Using Conditional Random Field
Dong Yang | Yi-cheng Pan | Sadaoki Furui
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration
Dong Yang | Paul Dixon | Yi-Cheng Pan | Tasuku Oonishi | Masanobu Nakamura | Sadaoki Furui
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2006

pdf
Monolingual Web-based Factoid Question Answering in Chinese, Swedish, English and Japanese
E.W.D. Whittaker | J. Hamonic | D. Yang | T. Klingberg | S. Furui
Proceedings of the Workshop on Multilingual Question Answering - MLQA ‘06