Hao Zhang


2021

pdf bib
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang | Aixin Sun | Wei Jing | Liangli Zhen | Joey Tianyi Zhou | Siow Mong Rick Goh
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
COSY: COunterfactual SYntax for Cross-Lingual Understanding
Sicheng Yu | Hao Zhang | Yulei Niu | Qianru Sun | Jing Jiang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Pre-trained multilingual language models, e.g., multilingual-BERT, are widely used in cross-lingual tasks, yielding the state-of-the-art performance. However, such models suffer from a large performance gap between source and target languages, especially in the zero-shot setting, where the models are fine-tuned only on English but tested on other languages for the same task. We tackle this issue by incorporating language-agnostic information, specifically, universal syntax such as dependency relations and POS tags, into language models, based on the observation that universal syntax is transferable across different languages. Our approach, called COunterfactual SYntax (COSY), includes the design of SYntax-aware networks as well as a COunterfactual training method to implicitly force the networks to learn not only the semantics but also the syntax. To evaluate COSY, we conduct cross-lingual experiments on natural language inference and question answering using mBERT and XLM-R as network backbones. Our results show that COSY achieves the state-of-the-art performance for both tasks, without using auxiliary training data.

pdf bib
EnsLM: Ensemble Language Model for Data Diversity by Semantic Clustering
Zhibin Duan | Hao Zhang | Chaojie Wang | Zhengjue Wang | Bo Chen | Mingyuan Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Natural language processing (NLP) often faces the problem of data diversity such as different domains, themes, styles, and so on. Therefore, a single language model (LM) is insufficient to learn all knowledge from diverse samples. To solve this problem, we firstly propose an autoencoding topic model with a mixture prior (mATM) to perform clustering for the data, where the clusters defined in semantic space describes the data diversity. Having obtained the clustering assignment for each sample, we develop the ensemble LM (EnsLM) with the technique of weight modulation. Specifically, EnsLM contains a backbone that is adjusted by a few modulated weights to fit for different sample clusters. As a result, the backbone learns the shared knowledge among all clusters while modulated weights extract the cluster-specific features. EnsLM can be trained jointly with mATM with a flexible LM backbone. We evaluate the effectiveness of both mATM and EnsLM on various tasks.

pdf bib
PhotoChat: A Human-Human Dialogue Dataset With Photo Sharing Behavior For Joint Image-Text Modeling
Xiaoxue Zang | Lijuan Liu | Maria Wang | Yang Song | Hao Zhang | Jindong Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a new human-human dialogue dataset - PhotoChat, the first dataset that casts light on the photo sharing behavior in online messaging. PhotoChat contains 12k dialogues, each of which is paired with a user photo that is shared during the conversation. Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context. In addition, for both tasks, we provide baseline models using the state-of-the-art models and report their benchmark performances. The best image retrieval model achieves 10.4% recall@1 (out of 1000 candidates) and the best photo intent prediction model achieves 58.1% F1 score, indicating that the dataset presents interesting yet challenging real-world problems. We are releasing PhotoChat to facilitate future research work among the community.

2020

pdf bib
Friendly Topic Assistant for Transformer Based Abstractive Summarization
Zhengjue Wang | Zhibin Duan | Hao Zhang | Chaojie Wang | Long Tian | Bo Chen | Mingyuan Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Abstractive document summarization is a comprehensive task including document understanding and summary generation, in which area Transformer-based models have achieved the state-of-the-art performance. Compared with Transformers, topic models are better at learning explicit document semantics, and hence could be integrated into Transformers to further boost their performance. To this end, we rearrange and explore the semantics learned by a topic model, and then propose a topic assistant (TA) including three modules. TA is compatible with various Transformer-based models and user-friendly since i) TA is a plug-and-play model that does not break any structure of the original Transformer network, making users easily fine-tune Transformer+TA based on a well pre-trained model; ii) TA only introduces a small number of extra parameters. Experimental results on three datasets demonstrate that TA is able to improve the performance of several Transformer-based models.

pdf bib
Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities
Hao Zhang | Jae Ro | Richard Sproat
Proceedings of the 28th International Conference on Computational Linguistics

Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search. We link this problem to the classic problem of Chinese word segmentation and show the effectiveness of a tagging model based on Recurrent Neural Networks (RNNs) using characters as input. To compensate for the lack of training data, we propose a pre-training method on concatenated entity names in a large knowledge database. Pre-training improves the model by 33% and brings the sequence accuracy to 85%.

pdf bib
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang | Aixin Sun | Wei Jing | Joey Tianyi Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Given an untrimmed video and a text query, natural language video localization (NLVL) is to locate a matching span from the video that semantically corresponds to the query. Existing solutions formulate NLVL either as a ranking task and apply multimodal matching architecture, or as a regression task to directly regress the target video span. In this work, we address NLVL task with a span-based QA approach by treating the input video as text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework, to address NLVL. The proposed VSLNet tackles the differences between NLVL and span-based QA through a simple and yet effective query-guided highlighting (QGH) strategy. The QGH guides VSLNet to search for matching video span within a highlighted region. Through extensive experiments on three benchmark datasets, we show that the proposed VSLNet outperforms the state-of-the-art methods; and adopting span-based QA framework is a promising direction to solve NLVL.

pdf bib
Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering
Ming Yan | Hao Zhang | Di Jin | Joey Tianyi Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multiple-choice question answering (MCQA) is one of the most challenging tasks in machine reading comprehension since it requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations. Unfortunately, most existing MCQA datasets are small in size, which increases the difficulty of model learning and generalization. To address this challenge, we propose a multi-source meta transfer (MMT) for low-resource MCQA. In this framework, we first extend meta learning by incorporating multiple training sources to learn a generalized feature representation across domains. To bridge the distribution gap between training sources and the target, we further introduce the meta transfer that can be integrated into the multi-source meta training. More importantly, the proposed MMT is independent of backbone language models. Extensive experiments demonstrate the superiority of MMT over state-of-the-arts, and continuous improvements can be achieved on different backbone networks on both supervised and unsupervised domain adaptation settings.

2019

pdf bib
Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition
Joey Tianyi Zhou | Hao Zhang | Di Jin | Hongyuan Zhu | Meng Fang | Rick Siow Mong Goh | Kenneth Kwok
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER). Specifically, two variants of DATNet, i.e., DATNet-F and DATNet-P, are investigated to explore effective feature fusion between high and low resource. To address the noisy and imbalanced training data, we propose a novel Generalized Resource-Adversarial Discriminator (GRAD). Additionally, adversarial training is adopted to boost model generalization. In experiments, we examine the effects of different components in DATNet across domains and languages and show that significant improvement can be obtained especially for low-resource data, without augmenting any additional hand-crafted features and pre-trained language model.

pdf bib
Neural Models of Text Normalization for Speech Applications
Hao Zhang | Richard Sproat | Axel H. Ng | Felix Stahlberg | Xiaochang Peng | Kyle Gorman | Brian Roark
Computational Linguistics, Volume 45, Issue 2 - June 2019

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars.We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process.These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.

2018

pdf bib
Fast and Accurate Reordering with ITG Transition RNN
Hao Zhang | Axel Ng | Richard Sproat
Proceedings of the 27th International Conference on Computational Linguistics

Attention-based sequence-to-sequence neural network models learn to jointly align and translate. The quadratic-time attention mechanism is powerful as it is capable of handling arbitrary long-distance reordering, but computationally expensive. In this paper, towards making neural translation both accurate and efficient, we follow the traditional pre-reordering approach to decouple reordering from translation. We add a reordering RNN that shares the input encoder with the decoder. The RNNs are trained jointly with a multi-task loss function and applied sequentially at inference time. The task of the reordering model is to predict the permutation of the input words following the target language word order. After reordering, the attention in the decoder becomes more peaked and monotonic. For reordering, we adopt the Inversion Transduction Grammars (ITG) and propose a transition system to parse input to trees for reordering. We harness the ITG transition system with RNN. With the modeling power of RNN, we achieve superior reordering accuracy without any feature engineering. In experiments, we apply the model to the task of text normalization. Compared to a strong baseline of attention-based RNN, our ITG RNN re-ordering model can reach the same reordering accuracy with only 1/10 of the training data and is 2.5x faster in decoding.

pdf bib
WordNet Troponymy and Extraction of “Manner-Result” Relations
Aliaksandr Huminski | Hao Zhang
Proceedings of the 9th Global Wordnet Conference

Commonsense knowledge bases need to have relations that allow to predict the consequences of specific actions (say, if John stabbed Peter, Peter might be killed) and to unfold the possible actions for the specific results (Peter was killed. It could happen because of poisoning, stabbing, shooting, etc.) This kind of causal relations are established between manner verbs and result verbs: manner-result relations. We offer a procedure on how to extract manner-result relations from WordNet through the analysis of the troponym glosses. The procedure of extraction includes three steps and the results are based on the analysis of the whole set of verbs in WordNet.

pdf bib
UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification
Andreas Hanselowski | Hao Zhang | Zile Li | Daniil Sorokin | Benjamin Schiller | Claudia Schulz | Iryna Gurevych
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

The Fact Extraction and VERification (FEVER) shared task was launched to support the development of systems able to verify claims by extracting supporting or refuting facts from raw text. The shared task organizers provide a large-scale dataset for the consecutive steps involved in claim verification, in particular, document retrieval, fact extraction, and claim classification. In this paper, we present our claim verification pipeline approach, which, according to the preliminary results, scored third in the shared task, out of 23 competing systems. For the document retrieval, we implemented a new entity linking approach. In order to be able to rank candidate facts and classify a claim on the basis of several selected facts, we introduce two extensions to the Enhanced LSTM (ESIM).

2016

pdf bib
Learning Concept Taxonomies from Multi-modal Data
Hao Zhang | Zhiting Hu | Yuntian Deng | Mrinmaya Sachan | Zhicheng Yan | Eric Xing
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
KWB: An Automated Quick News System for Chinese Readers
Yiqi Bai | Wenjing Yang | Hao Zhang | Jingwen Wang | Ming Jia | Roland Tong | Jie Wang
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf bib
Enforcing Structural Diversity in Cube-pruned Dependency Parsing
Hao Zhang | Ryan McDonald
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Online Learning for Inexact Hypergraph Search
Hao Zhang | Liang Huang | Kai Zhao | Ryan McDonald
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Universal Dependency Annotation for Multilingual Parsing
Ryan McDonald | Joakim Nivre | Yvonne Quirmbach-Brundage | Yoav Goldberg | Dipanjan Das | Kuzman Ganchev | Keith Hall | Slav Petrov | Hao Zhang | Oscar Täckström | Claudia Bedini | Núria Bertomeu Castelló | Jungmee Lee
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Tong Xiao | Jingbo Zhu | Hao Zhang | Qiang Li
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Generalized Higher-Order Dependency Parsing with Cube Pruning
Hao Zhang | Ryan McDonald
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Binarized Forest to String Translation
Hao Zhang | Licheng Fang | Peng Xu | Xiaoyun Wu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Document-level Consistency Verification in Machine Translation
Tong Xiao | Jingbo Zhu | Shujie Yao | Hao Zhang
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
An Empirical Study of Translation Rule Extraction with Multiple Parsers
Tong Xiao | Jingbo Zhu | Hao Zhang | Muhua Zhu
Coling 2010: Posters

pdf bib
NEUNLPLab Chinese Word Sense Induction System for SIGHAN Bakeoff 2010
Hao Zhang | Tong Xiao | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf bib
Binarization of Synchronous Context-Free Grammars
Liang Huang | Hao Zhang | Daniel Gildea | Kevin Knight
Computational Linguistics, Volume 35, Number 4, December 2009

2008

pdf bib
Extracting Synchronous Grammar Rules From Word-Level Alignments in Linear Time
Hao Zhang | Daniel Gildea | David Chiang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
Hao Zhang | Chris Quirk | Robert C. Moore | Daniel Gildea
Proceedings of ACL-08: HLT

pdf bib
Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
Hao Zhang | Daniel Gildea
Proceedings of ACL-08: HLT

2007

pdf bib
Factorization of Synchronous Context-Free Grammars in Linear Time
Hao Zhang | Daniel Gildea
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

2006

pdf bib
Synchronous Binarization for Machine Translation
Hao Zhang | Liang Huang | Daniel Gildea | Kevin Knight
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Efficient Search for Inversion Transduction Grammar
Hao Zhang | Daniel Gildea
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Factoring Synchronous Grammars by Sorting
Daniel Gildea | Giorgio Satta | Hao Zhang
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Inducing Word Alignments with Bilexical Synchronous Trees
Hao Zhang | Daniel Gildea
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Machine Translation as Lexicalized Parsing with Hooks
Liang Huang | Hao Zhang | Daniel Gildea
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib
Stochastic Lexicalized Inversion Transduction Grammar for Alignment
Hao Zhang | Daniel Gildea
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Syntax-Based Alignment: Supervised or Unsupervised?
Hao Zhang | Daniel Gildea
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Chinese Lexical Analysis Using Hierarchical Hidden Markov Model
Hua-Ping Zhang | Qun Liu | Xue-Qi Cheng | Hao Zhang | Hong-Kui Yu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

2002

pdf bib
Automatic Recognition of Chinese Unknown Words Based on Roles Tagging
Kevin Zhang | Qun Liu | Hao Zhang | Xue-Qi Cheng
COLING-02: The First SIGHAN Workshop on Chinese Language Processing