Xiaofei Ma


2022

pdf
Learning Dialogue Representations from Consecutive Utterances
Zhihan Zhou | Dejiao Zhang | Wei Xiao | Nicholas Dingwall | Xiaofei Ma | Andrew Arnold | Bing Xiang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Learning high-quality dialogue representations is essential for solving a variety of dialogue-oriented tasks, especially considering that dialogue systems often suffer from data scarcity. In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks. DSE learns from dialogues by taking consecutive utterances of the same dialogue as positive pairs for contrastive learning. Despite its simplicity, DSE achieves significantly better representation capability than other dialogue representation and universal sentence representation models. We evaluate DSE on five downstream dialogue tasks that examine dialogue representation at different semantic granularities. Experiments in few-shot and zero-shot settings show that DSE outperforms baselines by a large margin, for example, it achieves 13% average performance improvement over the strongest unsupervised baseline in 1-shot intent classification on 6 datasets. We also provide analyses on the benefits and limitations of our model.

pdf
Debiasing Neural Retrieval via In-batch Balancing Regularization
Yuantong Li | Xiaokai Wei | Zijian Wang | Shen Wang | Parminder Bhatia | Xiaofei Ma | Andrew Arnold
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics. The in-processing fair ranking methods provides a trade-offs between accuracy and fairness through adding a fairness-related regularization term in the loss function. However, there haven’t been intuitive objective functions that depend on the click probability and user engagement to directly optimize towards this.In this work, we propose the {textbf{I}n-{textbf{B}atch {textbf{B}alancing {textbf{R}egularization (IBBR) to mitigate the ranking disparity among subgroups. In particular, we develop a differentiable {textbf{normed Pairwise Ranking Fairness} (nPRF) and leverage the T-statistics on top of nPRF over subgroups as a regularization to improve fairness. Empirical results with the BERT-based neural rankers on the MS MARCO Passage Retrieval dataset with the human-annotated non-gendered queries benchmark {cite{rekabsaz2020neural} show that our {ibbr{} method with nPRF achieves significantly less bias with minimal degradation in ranking performance compared with the baseline.

pdf
Virtual Augmentation Supported Contrastive Learning of Sentence Representations
Dejiao Zhang | Wei Xiao | Henghui Zhu | Xiaofei Ma | Andrew Arnold
Findings of the Association for Computational Linguistics: ACL 2022

Despite profound successes, contrastive representation learning relies on carefully designed data augmentations using domain-specific knowledge. This challenge is magnified in natural language processing, where no general rules exist for data augmentation due to the discrete nature of natural language. We tackle this challenge by presenting a Virtual augmentation Supported Contrastive Learning of sentence representations (VaSCL). Originating from the interpretation that data augmentation essentially constructs the neighborhoods of each training instance, we, in turn, utilize the neighborhood to generate effective data augmentations. Leveraging the large training batch size of contrastive learning, we approximate the neighborhood of an instance via its K-nearest in-batch neighbors in the representation space. We then define an instance discrimination task regarding the neighborhood and generate the virtual augmentation in an adversarial training manner. We access the performance of VaSCL on a wide range of downstream tasks and set a new state-of-the-art for unsupervised sentence representation learning.

pdf
Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner
Danilo Neves Ribeiro | Shen Wang | Xiaofei Ma | Rui Dong | Xiaokai Wei | Henghui Zhu | Xinchi Chen | Peng Xu | Zhiheng Huang | Andrew Arnold | Dan Roth
Findings of the Association for Computational Linguistics: NAACL 2022

Large language models have achieved high performance on various question answering (QA) benchmarks, but the explainability of their output remains elusive. Structured explanations, called entailment trees, were recently suggested as a way to explain the reasoning behind a QA system’s answer. In order to better generate such entailment trees, we propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR). Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. The IRGR model iteratively searches for suitable premises, constructing a single entailment step at a time. Contrary to previous approaches, our method combines generation steps and retrieval of premises, allowing the model to leverage intermediate conclusions, and mitigating the input size limit of baseline encoder-decoder models. We conduct experiments using the EntailmentBank dataset, where we outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.

2021

pdf
Contrastive Fine-tuning Improves Robustness for Neural Rankers
Xiaofei Ma | Cicero Nogueira dos Santos | Andrew O. Arnold
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Contrastive Document Representation Learning with Graph Attention Networks
Peng Xu | Xinchi Chen | Xiaofei Ma | Zhiheng Huang | Bing Xiang
Findings of the Association for Computational Linguistics: EMNLP 2021

Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

2020

pdf
Beyond [CLS] through Ranking by Generation
Cicero Nogueira dos Santos | Xiaofei Ma | Ramesh Nallapati | Zhiheng Huang | Bing Xiang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Generative models for Information Retrieval, where ranking of documents is viewed as the task of generating a query from a document’s language model, were very successful in various IR tasks in the past. However, with the advent of modern deep neural networks, attention has shifted to discriminative ranking functions that model the semantic similarity of documents and queries instead. Recently, deep generative models such as GPT2 and BART have been shown to be excellent text generators, but their effectiveness as rankers have not been demonstrated yet. In this work, we revisit the generative framework for information retrieval and show that our generative approaches are as effective as state-of-the-art semantic similarity-based discriminative models for the answer selection task. Additionally, we demonstrate the effectiveness of unlikelihood losses for IR.

2019

pdf
Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering
Zhiguo Wang | Patrick Ng | Xiaofei Ma | Ramesh Nallapati | Bing Xiang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

BERT model has been successfully applied to open-domain QA tasks. However, previous work trains BERT by viewing passages corresponding to the same question as independent training instances, which may cause incomparable scores for answers from different passages. To tackle this issue, we propose a multi-passage BERT model to globally normalize answer scores across all passages of the same question, and this change enables our QA model find better answers by utilizing more passages. In addition, we find that splitting articles into passages with the length of 100 words by sliding window improves performance by 4%. By leveraging a passage ranker to select high-quality passages, multi-passage BERT gains additional 2%. Experiments on four standard benchmarks showed that our multi-passage BERT outperforms all state-of-the-art models on all benchmarks. In particular, on the OpenSQuAD dataset, our model gains 21.4% EM and 21.5% F1 over all non-BERT models, and 5.8% EM and 6.5% F1 over BERT-based models.

pdf
Domain Adaptation with BERT-based Domain Classification and Data Selection
Xiaofei Ma | Peng Xu | Zhiguo Wang | Ramesh Nallapati | Bing Xiang
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

The performance of deep neural models can deteriorate substantially when there is a domain shift between training and test data. For example, the pre-trained BERT model can be easily fine-tuned with just one additional output layer to create a state-of-the-art model for a wide range of tasks. However, the fine-tuned BERT model suffers considerably at zero-shot when applied to a different domain. In this paper, we present a novel two-step domain adaptation framework based on curriculum learning and domain-discriminative data selection. The domain adaptation is conducted in a mostly unsupervised manner using a small target domain validation set for hyper-parameter tuning. We tested the framework on four large public datasets with different domain similarities and task types. Our framework outperforms a popular discrepancy-based domain adaptation method on most transfer tasks while consuming only a fraction of the training budget.