Zhengxian Gong

Also published as: ZhengXian Gong


2021

pdf bib
融合零指代识别的篇章级机器翻译(Context-aware Machine Translation Integrating Zero Pronoun Recognition)
Hao Wang (汪浩) | Junhui Li (李军辉) | Zhengxian Gong (贡正仙)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

在汉语等其他有省略代词习惯的语言中,通常会删掉可从上下文信息推断出的代词。尽管以Transformer为代表的的神经机器翻译模型取得了巨大的成功,但这种省略现象依旧对神经机器翻译模型造成了很大的挑战。本文在Transformer基础上提出了一个融合零指代识别的翻译模型,并引入篇章上下文来丰富指代信息。具体地,该模型采用联合学习的框架,在翻译模型基础上,联合了一个分类任务,即判别句子中省略代词在句子所表示的成分,使得模型能够融合零指代信息辅助翻译。通过在中英对话数据集上的实验,验证了本文提出方法的有效性,与基准模型相比,翻译性能提升了1.48个BLEU值。

pdf
利用语义关联增强的跨语言预训练模型的译文质量评估(A Cross-language Pre-trained Model with Enhanced Semantic Connection for MT Quality Estimation)
Heng Ye (叶恒) | Zhengxian Gong (贡正仙)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

机器翻译质量评估(QE)虽然不需要参考译文就能进行自动评估,但它需要人工标注的评估数据进行训练。基于神经网络框架的QE为了克服人工评估数据的稀缺问题,通常包括两个阶段,首先借助大规模的平行语料学习双语对齐,然后在小规模评估数据集上进行评估建模。跨语言预训练模型可以用来代替该任务第一阶段的学习过程,因此本文首先建议一个基于XLM-R的为源/目标语言统一编码的QE模型。其次,由于大多数预训练模型是在多语言的单语数据集上构建的,因此两两语言对的语义关联能力相对较弱。为了能使跨语言预训练模型更好地适应QE任务,本文提出用三种预训练策略来增强预训练模型的跨语言语义关联能力。本文的方法在WMT2017和WMT2019英德评估数据集上都达到了最高性能。

pdf
基于序列到序列的中文AMR解析(Chinese AMR Parsing based on Sequence-to-Sequence Modeling)
Ziyi Huang (黄子怡) | Junhui Li (李军辉) | Zhengxian Gong (贡正仙)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

抽象语义表示(Abstract Meaning Representation,简称AMR)是将给定的文本的语义特征抽象成一个单根的有向无环图。AMR语义解析则是根据输入的文本获取对应的AMR图。相比于英文AMR,中文AMR的研究起步较晚,造成针对中文的AMR语义解析相关研究较少。本文针对公开的中文AMR语料库CAMR1.0,采用序列到序列的方法进行中文AMR语义解析的相关研究。具体地,首先基于Transformer模型实现一个适用于中文的序列到序列AMR语义解析系统;然后,探索并比较了不同预训练模型在中文AMR语义解析中的应用。基于该语料,本文中文AMR语义解析方法最优性能达到了70.29的Smatch F1值。本文是第一次在该数据集上报告实验结果。

pdf
Encouraging Lexical Translation Consistency for Document-Level Neural Machine Translation
Xinglin Lyu | Junhui Li | Zhengxian Gong | Min Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recently a number of approaches have been proposed to improve translation performance for document-level neural machine translation (NMT). However, few are focusing on the subject of lexical translation consistency. In this paper we apply “one translation per discourse” in NMT, and aim to encourage lexical translation consistency for document-level NMT. This is done by first obtaining a word link for each source word in a document, which tells the positions where the source word appears. Then we encourage the translation of those words within a link to be consistent in two ways. On the one hand, when encoding sentences within a document we properly share context information of those words. On the other hand, we propose an auxiliary loss function to better constrain that their translation should be consistent. Experimental results on Chinese↔English and English→French translation tasks show that our approach not only achieves state-of-the-art performance in BLEU scores, but also greatly improves lexical consistency in translation.

pdf
Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training
Linqing Chen | Junhui Li | Zhengxian Gong | Boxing Chen | Weihua Luo | Min Zhang | Guodong Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Context-aware neural machine translation (NMT) remains challenging due to the lack of large-scale document-level parallel corpora. To break the corpus bottleneck, in this paper we aim to improve context-aware NMT by taking the advantage of the availability of both large-scale sentence-level parallel dataset and source-side monolingual documents. To this end, we propose two pre-training tasks. One learns to translate a sentence from source language to target language on the sentence-level parallel dataset while the other learns to translate a document from deliberately noised to original on the monolingual documents. Importantly, the two pre-training tasks are jointly and simultaneously learned via the same model, thereafter fine-tuned on scale-limited parallel documents from both sentence-level and document-level perspectives. Experimental results on four translation tasks show that our approach significantly improves translation performance. One nice property of our approach is that the fine-tuned model can be used to translate both sentences and documents.

2020

pdf
层次化结构全局上下文增强的篇章级神经机器翻译(Hierarchical Global Context Augmented Document-level Neural Machine Translation)
Linqing Chen (陈林卿) | Junhui Li (李军辉) | Zhengxian Gong (贡正仙)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

如何有效利用篇章上下文信息一直是篇章级神经机器翻译研究领域的一大挑战。本文提出利用来源于整个篇章的层次化全局上下文提高篇章级神经机器翻译性能。为了实现该目标,本文模型分别获取当前句内单词与篇章内所有句子及单词之间的依赖关系,结合不同层次的依赖关系以获取含有层次化篇章信息的全局上下文。最终源语言当前句子中的每个单词都能获取其独有的综合词和句级别依赖关系的上下文。为了充分利用平行句对语料在训练中的优势本文使用两步训练法,在句子级语料训练模型的基础上使用含有篇章信息的语料进行二次训练以获得捕获全局上下文的能力。在若干基准语料数据集上的实验表明本文提出的模型与若干强基准模型相比取得了有意义的翻译质量提升。实验进一步表明,结合层次化篇章信息的上下文比仅使用词级别上下文更具优势。除此之外,本文尝试通过不同方式将全局上下文与翻译模型结合并观察其对模型性能的影响,并初步探究篇章翻译中全局上下文在篇章中的分布情况。

2016

pdf
Semi-supervised Gender Classification with Joint Textual and Social Modeling
Shoushan Li | Bin Dai | Zhengxian Gong | Guodong Zhou
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In gender classification, labeled data is often limited while unlabeled data is ample. This motivates semi-supervised learning for gender classification to improve the performance by exploring the knowledge in both labeled and unlabeled data. In this paper, we propose a semi-supervised approach to gender classification by leveraging textual features and a specific kind of indirect links among the users which we call “same-interest” links. Specifically, we propose a factor graph, namely Textual and Social Factor Graph (TSFG), to model both the textual and the “same-interest” link information. Empirical studies demonstrate the effectiveness of the proposed approach to semi-supervised gender classification.

pdf
Improving Statistical Machine Translation with Selectional Preferences
Haiqing Tang | Deyi Xiong | Min Zhang | Zhengxian Gong
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Long-distance semantic dependencies are crucial for lexical choice in statistical machine translation. In this paper, we study semantic dependencies between verbs and their arguments by modeling selectional preferences in the context of machine translation. We incorporate preferences that verbs impose on subjects and objects into translation. In addition, bilingual selectional preferences between source-side verbs and target-side arguments are also investigated. Our experiments on Chinese-to-English translation tasks with large-scale training data demonstrate that statistical machine translation using verbal selectional preferences can achieve statistically significant improvements over a state-of-the-art baseline.

2015

pdf
Document-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion
Zhengxian Gong | Min Zhang | Guodong Zhou
Proceedings of the Second Workshop on Discourse in Machine Translation

2012

pdf
Classifier-Based Tense Model for SMT
ZhengXian Gong | Min Zhang | ChewLim Tan | GuoDong Zhou
Proceedings of COLING 2012: Posters

pdf
Phrase-Based Evaluation for Machine Translation
Liangyou Li | Zhengxian Gong | Guodong Zhou
Proceedings of COLING 2012: Posters

pdf
N-gram-based Tense Models for Statistical Machine Translation
Zhengxian Gong | Min Zhang | Chew Lim Tan | Guodong Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf
Improve SMT with Source-Side “Topic-Document” Distributions
Zhengxian Gong | Guodong Zhou | Liangyou Li
Proceedings of Machine Translation Summit XIII: Papers

pdf
Cache-based Document-level Statistical Machine Translation
Zhengxian Gong | Min Zhang | Guodong Zhou
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing