Nan Shao


2020

pdf
Is Graph Structure Necessary for Multi-hop Question Answering?
Nan Shao | Yiming Cui | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recently, attempting to model texts as graph structure and introducing graph neural networks to deal with it has become a trend in many NLP research areas. In this paper, we investigate whether the graph structure is necessary for textual multi-hop reasoning. Our analysis is centered on HotpotQA. We construct a strong baseline model to establish that, with the proper use of pre-trained models, graph structure may not be necessary for textual multi-hop reasoning. We point out that both graph structure and adjacency matrix are task-related prior knowledge, and graph-attention can be considered as a special case of self-attention. Experiments demonstrate that graph-attention or the entire graph structure can be replaced by self-attention or Transformers.

2019

pdf
TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-Based Chatbots
Wentao Ma | Yiming Cui | Nan Shao | Su He | Wei-Nan Zhang | Ting Liu | Shijin Wang | Guoping Hu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response > in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation of each element based on the attention with the other two concurrently and symmetrically.We match the triple <C, Q, R> centered on the response from char to context level for prediction.Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods.