Mingwen Wang

Also published as: Ming-Wei Wang, MingWen Wang


2023

pdf
Leveraging Contrastive Learning and Knowledge Distillation for Incomplete Modality Rumor Detection
Fan Xu | Pinyun Fu | Qi Huang | Bowei Zou | AiTi Aw | Mingwen Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Rumors spread rapidly through online social microblogs at a relatively low cost, causing substantial economic losses and negative consequences in our daily lives. Existing rumor detection models often neglect the underlying semantic coherence between text and image components in multimodal posts, as well as the challenges posed by incomplete modalities in single modal posts, such as missing text or images. This paper presents CLKD-IMRD, a novel framework for Incomplete Modality Rumor Detection. CLKD-IMRD employs Contrastive Learning and Knowledge Distillation to capture the semantic consistency between text and image pairs, while also enhancing model generalization to incomplete modalities within individual posts. Extensive experimental results demonstrate that our CLKD-IMRD outperforms state-of-the-art methods on two English and two Chinese benchmark datasets for rumor detection in social media.

pdf
融合词典信息的古籍命名实体识别研究(A Study on the Recognition of Named Entities of Ancient Books Using Lexical Information)
Wenjun Kang (康文军) | Jiali Zuo (左家莉) | Anquan Jie (揭安全) | Wenbin Luo (罗文兵) | Mingwen Wang (王明文)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“古籍命名实体识别对于古籍实体知识库与语料库的建设具有显著的现实意义。目前古籍命名实体识别的研究较少,主要原因是缺乏足够的训练语料。本文从《资治通鉴》入手,人工构建了一份古籍命名实体识别数据集,以此展开对古籍命名实体识别任务的研究。针对古籍文本多以单字表意且存在大量省略的语言特点,本文采用预训练词向量作为词典信息,充分利用其中蕴涵的词汇信息。实验表明,这种方法可以有效处理古籍文本中人名实体识别的问题。”

pdf
结合全局对应矩阵和相对位置信息的古汉语实体关系联合抽取(Joint Extraction of Ancient Chinese Entity Relations by Combining Global Correspondence Matrix and Relative Position Information)
Yiyu Hu (胡益裕) | Jiali Zuo (左家莉) | Xueqiang Ceng (曾雪强) | Zhongying Wan (万中英) | Mingwen Wang (王明文)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“实体关系抽取是信息抽取领域中一项重要任务,目前实体关系抽取任务主要聚焦于英文和现代汉语领域,关于古汉语领域的数据集构建和方法的研究目前却较少。针对这一问题,本文在研究了开源的《资治通鉴》语料后,人工构建了一个古汉语实体关系数据集,并设计了一种结合全局对应矩阵和相对位置信息的实体关系联合抽取方法。最后通过在本文构建的数据集上进行实验,证明了该方法在古汉语实体关系抽取任务上的有效性。”

2021

pdf bib
融合XLM词语表示的神经机器译文自动评价方法(Neural Automatic Evaluation of Machine Translation Method Combined with XLM Word Representation)
Wei Hu (胡纬) | Maoxi Li (李茂西) | Bailian Qiu (裘白莲) | Mingwen Wang (王明文)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

机器译文自动评价对机器翻译的发展和应用起着重要的促进作用,它一般通过计算机器译文和人工参考译文的相似度来度量机器译文的质量。该文通过跨语种预训练语言模型XLM将源语言句子、机器译文和人工参考译文映射到相同的语义空间,结合分层注意力和内部注意力提取源语言句子与机器译文、机器译文与人工参考译文以及源语言句子与人工参考译文之间差异特征,并将其融入到基于Bi-LSTM神经译文自动评价方法中。在WMT’19译文自动评价数据集上的实验结果表明,融合XLM词语表示的神经机器译文自动评价方法显著提高了其与人工评价的相关性。

pdf
基于自动识别的委婉语历时性发展变化与社会共变研究(A Study on the Diachronic Development and Social Covariance of Euphemism Based on Automatic Recognition)
Chenlin Zhang (张辰麟) | Mingwen Wang (王明文) | Yiming Tan (谭亦鸣) | Ming Yin (尹明) | Xinyi Zhang (张心怡)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

本文主要以汉语委婉语作为研究对象,基于大量人工标注,借助机器学习有监督分类方法,实现了较高精度的委婉语自动识别,并基于此对1946年-2017年的《人民日报》中的委婉语历时变化发展情况进行量化统计分析。从大规模数据的角度探讨委婉语历时性发展变化、委婉语与社会之间的共变关系,验证了语言的格雷什姆规律与更新规律。

2020

pdf
“细粒度英汉机器翻译错误分析语料库”的构建与思考(Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation and Its Implications)
Bailian Qiu (裘白莲) | Mingwen Wang (王明文) | Maoxi Li (李茂西) | Cong Chen (陈聪) | Fan Xu (徐凡)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

机器翻译错误分析旨在找出机器译文中存在的错误,包括错误类型、错误分布等,它在机器翻译研究和应用中起着重要作用。该文将人工译后编辑与错误分析结合起来,对译后编辑操作进行错误标注,采用自动标注和人工标注相结合的方法,构建了一个细粒度英汉机器翻译错误分析语料库,其中每一个标注样本包括源语言句子、机器译文、人工参考译文、译后编辑译文、词错误率和错误类型标注;标注的错误类型包括增词、漏词、错词、词序错误、未译和命名实体翻译错误等。标注的一致性检验表明了标注的有效性;对标注语料的统计分析结果能有效地指导机器翻译系统的开发和人工译员的后编辑。

2018

pdf
Building Parallel Monolingual Gan Chinese Dialects Corpus
Fan Xu | Mingwen Wang | Maoxi Li
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Improving Machine Translation Quality Estimation with Neural Network Features
Zhiming Chen | Yiming Tan | Chenlin Zhang | Qingyu Xiang | Lilin Zhang | Maoxi Li | Mingwen Wang
Proceedings of the Second Conference on Machine Translation

pdf
Neural Post-Editing Based on Quality Estimation
Yiming Tan | Zhiming Chen | Liu Huang | Lilin Zhang | Maoxi Li | Mingwen Wang
Proceedings of the Second Conference on Machine Translation

2016

pdf
Extract Domain-specific Paraphrase from Monolingual Corpus for Automatic Evaluation of Machine Translation
Lilin Zhang | Zhen Weng | Wenyan Xiao | Jianyi Wan | Zhiming Chen | Yiming Tan | Maoxi Li | Mingwen Wang
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf
Building Monolingual Word Alignment Corpus for the Greater China Region
Fan Xu | Xiongfei Xu | Mingwen Wang | Maoxi Li
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

2013

pdf
Listwise Approach to Learning to Rank for Automatic Evaluation of Machine Translation
Maoxi Li | Aiwen Jiang | Mingwen Wang
Proceedings of Machine Translation Summit XIV: Papers

2012

pdf bib
Confusion Network Based System Combination for Chinese Translation Output: Word-Level or Character-Level?
Maoxi Li | MingWen Wang
Proceedings of the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT

2010

pdf
Integer Linear Programming in NLP - Constrained Conditional Models
Ming-Wei Wang | Nicholas Rizzolo | Dan Roth
NAACL HLT 2010 Tutorial Abstracts