Ni Lao


2023

pdf
RARR: Researching and Revising What Language Models Say, Using Language Models
Luyu Gao | Zhuyun Dai | Panupong Pasupat | Anthony Chen | Arun Tejasvi Chaganty | Yicheng Fan | Vincent Zhao | Ni Lao | Hongrae Lee | Da-Cheng Juan | Kelvin Guu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Language models (LMs) now excel at many tasks such as question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model, and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.

2022

pdf
Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval
Ivan Montero | Shayne Longpre | Ni Lao | Andrew Frank | Christopher DuBois
Proceedings of the Workshop on Multilingual Information Access (MIA)

Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English. They not only suffer from the shortcomings of non-English document retrieval, but are reliant on language-specific supervision for either the task or translation. We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages. Assuming a strong English question answering model or database, we compare and analyze methods that pivot through English: to map foreign queries to English and then English answers back to target language answers. Within this task setup we propose Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking, which outperforms the strongest baselines by 2.7% on XQuAD and 6.2% on MKQA. Analysis demonstrates the particular efficacy of this strategy over state-of-the-art alternatives in challenging settings: low-resource languages, with extensive distractor data and query distribution misalignment. Circumventing retrieval, our analysis shows this approach offers rapid answer generation to many other languages off-the-shelf, without necessitating additional training data in the target language.

2021

pdf
Orthographic Transliteration for Kabyle Speech Recognition
Christopher Haberland | Ni Lao
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

2017

pdf
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Chen Liang | Jonathan Berant | Quoc Le | Kenneth D. Forbus | Ni Lao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine, which contains (a) a neural “programmer”, i.e., a sequence-to-sequence model that maps language utterances to programs and utilizes a key-variable memory to handle compositionality (b) a symbolic “computer”, i.e., a Lisp interpreter that performs program execution, and helps find good programs by pruning the search space. We apply REINFORCE to directly optimize the task reward of this structured prediction problem. To train with weak supervision and improve the stability of REINFORCE, we augment it with an iterative maximum-likelihood training process. NSM outperforms the state-of-the-art on the WebQuestionsSP dataset when trained from question-answer pairs only, without requiring any feature engineering or domain-specific knowledge.

2015

pdf
Learning Relational Features with Backward Random Walks
Ni Lao | Einat Minkov | William Cohen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2012

pdf
Reading The Web with Learned Syntactic-Semantic Inference Rules
Ni Lao | Amarnag Subramanya | Fernando Pereira | William W. Cohen
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf
Random Walk Inference and Learning in A Large Scale Knowledge Base
Ni Lao | Tom Mitchell | William W. Cohen
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing