Feng Nie


Learning Semantic Correspondences from Noisy Data-text Pairs by Local-to-Global Alignments
Feng Nie | Jinpeng Wang | Chin-Yew Lin
Proceedings of the 28th International Conference on Computational Linguistics

Learning semantic correspondences between structured input data (e.g., slot-value pairs) and associated texts is a core problem for many downstream NLP applications, e.g., data-to-text generation. Large-scale datasets recently proposed for generation contain loosely corresponding data text pairs, where part of spans in text cannot be aligned to its incomplete paired input. To learn semantic correspondences from such datasets, we propose a two-stage local-to-global alignment (L2GA) framework. First, a local model based on multi-instance learning is applied to build alignments for texts spans that can be directly grounded to its paired structured input. Then, a novel global model built upon a memory-guided conditional random field (CRF) layer aims to infer missing alignments for text spans which not supported by paired incomplete inputs, where the memory is designed to leverage alignment clues provided by the local model to strengthen the global model. In this way, the local model and global model can work jointly to learn semantic correspondences in the same framework. Experimental results show that our proposed method can be generalized to both restaurant and computer domains and improve the alignment accuracy.

Program Enhanced Fact Verification with Verbalization and Graph Attention Network
Xiaoyu Yang | Feng Nie | Yufei Feng | Quan Liu | Zhigang Chen | Xiaodan Zhu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Performing fact verification based on structured data is important for many real-life applications and is a challenging research problem, particularly when it involves both symbolic operations and informal inference based on language understanding. In this paper, we present a Program-enhanced Verbalization and Graph Attention Network (ProgVGAT) to integrate programs and execution into textual inference models. Specifically, a verbalization with program execution model is proposed to accumulate evidences that are embedded in operations over the tables. Built on that, we construct the graph attention verification networks, which are designed to fuse different sources of evidences from verbalized program execution, program structures, and the original statements and tables, to make the final verification decision. To support the above framework, we propose a program selection module optimized with a new training strategy based on margin loss, to produce more accurate programs, which is shown to be effective in enhancing the final verification results. Experimental results show that the proposed framework achieves the new state-of-the-art performance, a 74.4% accuracy, on the benchmark dataset TABFACT.


An Encoder with non-Sequential Dependency for Neural Data-to-Text Generation
Feng Nie | Jinpeng Wang | Rong Pan | Chin-Yew Lin
Proceedings of the 12th International Conference on Natural Language Generation

Data-to-text generation aims to generate descriptions given a structured input data (i.e., a table with multiple records). Existing neural methods for encoding input data can be divided into two categories: a) pooling based encoders which ignore dependencies between input records or b) recurrent encoders which model only sequential dependencies between input records. In our investigation, although the recurrent encoder generally outperforms the pooling based encoder by learning the sequential dependencies, it is sensitive to the order of the input records (i.e., performance decreases when injecting the random shuffling noise over input data). To overcome this problem, we propose to adopt the self-attention mechanism to learn dependencies between arbitrary input records. Experimental results show the proposed method achieves comparable results and remains stable under random shuffling over input data.

A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation
Feng Nie | Jin-Ge Yao | Jinpeng Wang | Rong Pan | Chin-Yew Lin
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent neural language generation systems often hallucinate contents (i.e., producing irrelevant or contradicted facts), especially when trained on loosely corresponding pairs of the input structure and text. To mitigate this issue, we propose to integrate a language understanding module for data refinement with self-training iterations to effectively induce strong equivalence between the input data and the paired text. Experiments on the E2E challenge dataset show that our proposed framework can reduce more than 50% relative unaligned noise from the original data-text pairs. A vanilla sequence-to-sequence neural NLG model trained on the refined data has improved on content correctness compared with the current state-of-the-art ensemble generator.


Operation-guided Neural Networks for High Fidelity Data-To-Text Generation
Feng Nie | Jinpeng Wang | Jin-Ge Yao | Rong Pan | Chin-Yew Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent neural models for data-to-text generation are mostly based on data-driven end-to-end training over encoder-decoder networks. Even though the generated texts are mostly fluent and informative, they often generate descriptions that are not consistent with the input structured data. This is a critical issue especially in domains that require inference or calculations over raw data. In this paper, we attempt to improve the fidelity of neural data-to-text generation by utilizing pre-executed symbolic operations. We propose a framework called Operation-guided Attention-based sequence-to-sequence network (OpAtt), with a specifically designed gating mechanism as well as a quantization module for operation results to utilize information from pre-executed operations. Experiments on two sports datasets show our proposed method clearly improves the fidelity of the generated texts to the input structured data.

Aggregated Semantic Matching for Short Text Entity Linking
Feng Nie | Shuyan Zhou | Jing Liu | Jinpeng Wang | Chin-Yew Lin | Rong Pan
Proceedings of the 22nd Conference on Computational Natural Language Learning

The task of entity linking aims to identify concepts mentioned in a text fragments and link them to a reference knowledge base. Entity linking in long text has been well studied in previous work. However, short text entity linking is more challenging since the text are noisy and less coherent. To better utilize the local information provided in short texts, we propose a novel neural network framework, Aggregated Semantic Matching (ASM), in which two different aspects of semantic information between the local context and the candidate entity are captured via representation-based and interaction-based neural semantic matching models, and then two matching signals work jointly for disambiguation with a rank aggregation mechanism. Our evaluation shows that the proposed model outperforms the state-of-the-arts on public tweet datasets.