William Cohen

Also published as: William W. Cohen


2021

pdf bib
Investigating the Effect of Background Knowledge on Natural Questions
Vidhisha Balachandran | Bhuwan Dhingra | Haitian Sun | Michael Collins | William Cohen
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Existing work shows the benefits of integrating KBs with textual evidence for QA only on questions that are answerable by KBs alone (Sun et al., 2019). In contrast, real world QA systems often have to deal with questions that might not be directly answerable by KBs. Here, we investigate the effect of integrating background knowledge from KBs for the Natural Questions (NQ) task. We create a subset of the NQ data, Factual Questions (FQ), where the questions have evidence in the KB in the form of paths that link question entities to answer entities but still must be answered using text, to facilitate further research into KB integration methods. We propose and analyze a simple, model-agnostic approach for incorporating KB paths into text-based QA systems and establish a strong upper bound on FQ for our method using an oracle retriever. We show that several variants of Personalized PageRank based fact retrievers lead to a low recall of answer entities and consequently fail to improve QA performance. Our results suggest that fact retrieval is a bottleneck for integrating KBs into real world QA datasets

pdf bib
Adaptable and Interpretable Neural MemoryOver Symbolic Knowledge
Pat Verga | Haitian Sun | Livio Baldini Soares | William Cohen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Past research has demonstrated that large neural language models (LMs) encode surprising amounts of factual information: however, augmenting or modifying this information requires modifying a corpus and retraining, which is computationally expensive. To address this problem, we develop a neural LM that includes an interpretable neuro-symbolic KB in the form of a “fact memory”. Each element of the fact memory is formed from a triple of vectors, where each vector corresponds to a KB entity or relation. Our LM improves performance on knowledge-intensive question-answering tasks, sometimes dramatically, including a 27 point increase in one setting of WebQuestionsSP over a state-of-the-art open-book model, despite using 5% of the parameters. Most interestingly, we demonstrate that the model can be modified, without any re-training, by updating the fact memory.

pdf bib
Differentiable Open-Ended Commonsense Reasoning
Bill Yuchen Lin | Haitian Sun | Bhuwan Dhingra | Manzil Zaheer | Xiang Ren | William Cohen
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Current commonsense reasoning research focuses on developing models that use commonsense knowledge to answer multiple-choice questions. However, systems designed to answer multiple-choice questions may not be useful in applications that do not provide a small list of candidate answers to choose from. As a step towards making commonsense reasoning research more realistic, we propose to study open-ended commonsense reasoning (OpenCSR) — the task of answering a commonsense question without any pre-defined choices — using as a resource only a corpus of commonsense facts written in natural language. OpenCSR is challenging due to a large decision space, and because many questions require implicit multi-hop reasoning. As an approach to OpenCSR, we propose DrFact, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts. To evaluate OpenCSR methods, we adapt several popular commonsense reasoning benchmarks, and collect multiple new answers for each test question via crowd-sourcing. Experiments show that DrFact outperforms strong baseline methods by a large margin.

pdf bib
MATE: Multi-view Attention for Table Transformer Efficiency
Julian Eisenschlos | Maharshi Gor | Thomas Müller | William Cohen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens. Here we propose MATE, a novel Transformer architecture designed to model the structure of web tables. MATE uses sparse attention in a way that allows heads to efficiently attend to either rows or columns in a table. This architecture scales linearly with respect to speed and memory, and can handle documents containing more than 8000 tokens with current accelerators. MATE also has a more appropriate inductive bias for tabular data, and sets a new state-of-the-art for three table reasoning datasets. For HybridQA (Chen et al., 2020), a dataset that involves large documents containing tables, we improve the best prior result by 19 points.

2019

pdf bib
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra | Manaal Faruqui | Ankur Parikh | Ming-Wei Chang | Dipanjan Das | William Cohen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall. Through a large scale human evaluation study of table-to-text models for WikiBio, we show that PARENT correlates with human judgments better than existing text generation metrics. We also adapt and evaluate the information extraction based evaluation proposed by Wiseman et al (2017), and show that PARENT has comparable correlation to it, while being easier to use. We show that PARENT is also applicable when the reference texts are elicited from humans using the data from the WebNLG challenge.

pdf bib
Probing Biomedical Embeddings from Language Models
Qiao Jin | Bhuwan Dhingra | William Cohen | Xinghua Lu
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

Contextualized word embeddings derived from pre-trained language models (LMs) show significant improvements on downstream NLP tasks. Pre-training on domain-specific corpora, such as biomedical articles, further improves their performance. In this paper, we conduct probing experiments to determine what additional information is carried intrinsically by the in-domain trained contextualized embeddings. For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers. We compare BERT (Devlin et al. 2018), ELMo (Peters et al., 2018), BioBERT (Lee et al., 2019) and BioELMo, a biomedical version of ELMo trained on 10M PubMed abstracts. Surprisingly, while fine-tuned BioBERT is better than BioELMo in biomedical NER and NLI tasks, as a fixed feature extractor BioELMo outperforms BioBERT in our probing tasks. We use visualization and nearest neighbor analysis to show that better encoding of entity-type and relational information leads to this superiority.

pdf bib
PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text
Haitian Sun | Tania Bedrax-Weiss | William Cohen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We consider open-domain question answering (QA) where answers are drawn from either a corpus, a knowledge base (KB), or a combination of both of these. We focus on a setting in which a corpus is supplemented with a large but incomplete KB, and on questions that require non-trivial (e.g., “multi-hop”) reasoning. We describe PullNet, an integrated framework for (1) learning what to retrieve and (2) reasoning with this heterogeneous information to find the best answer. PullNet uses an iterative process to construct a question-specific subgraph that contains information relevant to the question. In each iteration, a graph convolutional network (graph CNN) is used to identify subgraph nodes that should be expanded using retrieval (or “pull”) operations on the corpus and/or KB. After the subgraph is complete, another graph CNN is used to extract the answer from the subgraph. This retrieve-and-reason process allows us to answer multi-hop questions using large KBs and corpora. PullNet is weakly supervised, requiring question-answer pairs but not gold inference paths. Experimentally PullNet improves over the prior state-of-the art, and in the setting where a corpus is used with incomplete KB these improvements are often dramatic. PullNet is also often superior to prior systems in a KB-only setting or a text-only setting.

pdf bib
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin | Bhuwan Dhingra | Zhengping Liu | William Cohen | Xinghua Lu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts. PubMedQA has 1k expert-annotated, 61.2k unlabeled and 211.3k artificially generated QA instances. Each PubMedQA instance is composed of (1) a question which is either an existing research article title or derived from one, (2) a context which is the corresponding abstract without its conclusion, (3) a long answer, which is the conclusion of the abstract and, presumably, answers the research question, and (4) a yes/no/maybe answer which summarizes the conclusion. PubMedQA is the first QA dataset where reasoning over biomedical research texts, especially their quantitative contents, is required to answer the questions. Our best performing model, multi-phase fine-tuning of BioBERT with long answer bag-of-word statistics as additional supervision, achieves 68.1% accuracy, compared to single human performance of 78.0% accuracy and majority-baseline of 55.2% accuracy, leaving much room for improvement. PubMedQA is publicly available at https://pubmedqa.github.io.

2018

pdf bib
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang | Peng Qi | Saizheng Zhang | Yoshua Bengio | William Cohen | Ruslan Salakhutdinov | Christopher D. Manning
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

pdf bib
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
Haitian Sun | Bhuwan Dhingra | Manzil Zaheer | Kathryn Mazaitis | Ruslan Salakhutdinov | William Cohen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Open Domain Question Answering (QA) is evolving from complex pipelined systems to end-to-end deep neural networks. Specialized neural models have been developed for extracting answers from either text alone or Knowledge Bases (KBs) alone. In this paper we look at a more practical setting, namely QA over the combination of a KB and entity-linked text, which is appropriate when an incomplete KB is available with a large text corpus. Building on recent advances in graph representation learning we propose a novel model, GRAFT-Net, for extracting answers from a question-specific subgraph containing text and KB entities and relations. We construct a suite of benchmark tasks for this problem, varying the difficulty of questions, the amount of training data, and KB completeness. We show that GRAFT-Net is competitive with the state-of-the-art when tested using either KBs or text alone, and vastly outperforms existing methods in the combined setting.

pdf bib
Neural Models for Reasoning over Multiple Mentions Using Coreference
Bhuwan Dhingra | Qiao Jin | Zhilin Yang | William Cohen | Ruslan Salakhutdinov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets – Wikihop, LAMBADA and the bAbi AI tasks – with large gains when training data is scarce.

pdf bib
AttentionMeSH: Simple, Effective and Interpretable Automatic MeSH Indexer
Qiao Jin | Bhuwan Dhingra | William Cohen | Xinghua Lu
Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering

There are millions of articles in PubMed database. To facilitate information retrieval, curators in the National Library of Medicine (NLM) assign a set of Medical Subject Headings (MeSH) to each article. MeSH is a hierarchically-organized vocabulary, containing about 28K different concepts, covering the fields from clinical medicine to information sciences. Several automatic MeSH indexing models have been developed to improve the time-consuming and financially expensive manual annotation, including the NLM official tool – Medical Text Indexer, and the winner of BioASQ Task5a challenge – DeepMeSH. However, these models are complex and not interpretable. We propose a novel end-to-end model, AttentionMeSH, which utilizes deep learning and attention mechanism to index MeSH terms to biomedical text. The attention mechanism enables the model to associate textual evidence with annotations, thus providing interpretability at the word level. The model also uses a novel masking mechanism to enhance accuracy and speed. In the final week of BioASQ Chanllenge Task6a, we ranked 2nd by average MiF using an on-construction model. After the contest, we achieve close to state-of-the-art MiF performance of ∼ 0.684 using our final model. Human evaluations show AttentionMeSH also provides high level of interpretability, retrieving about 90% of all expert-labeled relevant words given an MeSH-article pair at 20 output.

pdf bib
Learning to Define Terms in the Software Domain
Vidhisha Balachandran | Dheeraj Rajagopal | Rose Catherine Kanjirathinkal | William Cohen
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

One way to test a person’s knowledge of a domain is to ask them to define domain-specific terms. Here, we investigate the task of automatically generating definitions of technical terms by reading text from the technical domain. Specifically, we learn definitions of software entities from a large corpus built from the user forum Stack Overflow. To model definitions, we train a language model and incorporate additional domain-specific information like word co-occurrence, and ontological category information. Our approach improves previous baselines by 2 BLEU points for the definition generation task. Our experiments also show the additional challenges associated with the task and the short-comings of language-model based architectures for definition generation.

2017

pdf bib
Semi-Supervised QA with Generative Domain-Adaptive Nets
Zhilin Yang | Junjie Hu | Ruslan Salakhutdinov | William Cohen
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the problem of semi-supervised question answering—utilizing unlabeled text to boost the performance of question answering models. We propose a novel training framework, the Generative Domain-Adaptive Nets. In this framework, we train a generative model to generate questions based on the unlabeled text, and combine model-generated questions with human-generated questions for training question answering models. We develop novel domain adaptation algorithms, based on reinforcement learning, to alleviate the discrepancy between the model-generated data distribution and the human-generated data distribution. Experiments show that our proposed framework obtains substantial improvement from unlabeled text.

pdf bib
Gated-Attention Readers for Text Comprehension
Bhuwan Dhingra | Hanxiao Liu | Zhilin Yang | William Cohen | Ruslan Salakhutdinov
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper we study the problem of answering cloze-style questions over documents. Our model, the Gated-Attention (GA) Reader, integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader. This enables the reader to build query-specific representations of tokens in the document for accurate answer selection. The GA Reader obtains state-of-the-art results on three benchmarks for this task–the CNN & Daily Mail news stories and the Who Did What dataset. The effectiveness of multiplicative interaction is demonstrated by an ablation study, and by comparing to alternative compositional operators for implementing the gated-attention.

2016

pdf bib
Scalable Statistical Relational Learning for NLP
William Yang Wang | William Cohen
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Using Graphs of Classifiers to Impose Constraints on Semi-supervised Relation Extraction
Lidong Bing | William Cohen | Bhuwan Dhingra | Richard Wang
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Tweet2Vec: Character-Based Distributed Representations for Social Media
Bhuwan Dhingra | Zhong Zhou | Dylan Fitzpatrick | Michael Muehl | William Cohen
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Joint Information Extraction and Reasoning: A Scalable Statistical Relational Learning Approach
William Yang Wang | William W. Cohen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Learning Relational Features with Backward Random Walks
Ni Lao | Einat Minkov | William Cohen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts
Dana Movshovitz-Attias | William W. Cohen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists
Lidong Bing | Sneha Chaudhari | Richard Wang | William Cohen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning to Identify the Best Contexts for Knowledge-based WSD
Evgenia Wasserman Pritsker | William Cohen | Einat Minkov
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach
William Yang Wang | Lingpeng Kong | Kathryn Mazaitis | William W. Cohen
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
What’s in a Domain? Multi-Domain Learning for Multi-Attribute Data
Mahesh Joshi | Mark Dredze | William W. Cohen | Carolyn P. Rosé
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Natural Language Models for Predicting Programming Comments
Dana Movshovitz-Attias | William W. Cohen
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Reading The Web with Learned Syntactic-Semantic Inference Rules
Ni Lao | Amarnag Subramanya | Fernando Pereira | William W. Cohen
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Multi-Domain Learning: When Do Domains Matter?
Mahesh Joshi | Mark Dredze | William W. Cohen | Carolyn Rosé
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia
Partha Talukdar | William Cohen
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Bootstrapping Biomedical Ontologies for Scientific Text using NELL
Dana Movshovitz-Attias | William W. Cohen
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
Alignment-HMM-based Extraction of Abbreviations from Biomedical Text
Dana Movshovitz-Attias | William W. Cohen
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
Evaluating Joint Modeling of Yeast Biology Literature and Protein-Protein Interaction Networks
Ramnath Balasubramanyan | Kathryn Rivard | William W. Cohen | Jelena Jakovljevic | John L. Woolford
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
Collectively Representing Semi-Structured Data from the Web
Bhavana Dalvi | William Cohen | Jamie Callan
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Graph Based Similarity Measures for Synonym Extraction from Parsed Text
Einat Minkov | William Cohen
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing

2011

pdf bib
Random Walk Inference and Learning in A Large Scale Knowledge Base
Ni Lao | Tom Mitchell | William W. Cohen
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
What pushes their buttons? Predicting comment polarity from the content of political blog posts
Ramnath Balasubramanyan | William W. Cohen | Doug Pierce | David P. Redlawsk
Proceedings of the Workshop on Language in Social Media (LSM 2011)

pdf bib
Structured Databases of Named Entities from Bayesian Nonparametrics
Jacob Eisenstein | Tae Yano | William Cohen | Noah Smith | Eric Xing
Proceedings of the First workshop on Unsupervised Learning in NLP

2009

pdf bib
Predicting Response to Political Blog Posts with Topic Models
Tae Yano | William W. Cohen | Noah A. Smith
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Automatic Set Instance Extraction using the Web
Richard C. Wang | William W. Cohen
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Character-level Analysis of Semi-Structured Documents for Set Expansion
Richard C. Wang | William W. Cohen
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Learning Graph Walk Based Similarity Measures for Parsed Text
Einat Minkov | William W. Cohen
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatic Set Expansion for List Question Answering
Richard C. Wang | Nico Schlaefer | William W. Cohen | Eric Nyberg
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition
Andrew Arnold | Ramesh Nallapati | William W. Cohen
Proceedings of ACL-08: HLT

2006

pdf bib
NER Systems that Suit User’s Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction
Einat Minkov | Richard Wang | Anthony Tomasic | William Cohen
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
A Graph-Search Framework for GeneId Ranking
William Cohen
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf bib
Improving “Email Speech Acts” Analysis via N-gram Selection
Vitor Carvalho | William Cohen
Proceedings of the Analyzing Conversations in Text and Speech

pdf bib
A Graphical Framework for Contextual Search and Name Disambiguation in Email
Einat Minkov | William Cohen | Andrew Ng
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing

2005

pdf bib
Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text
Einat Minkov | Richard C. Wang | William W. Cohen
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Learning to Classify Email into “Speech Acts”
William W. Cohen | Vitor R. Carvalho | Tom M. Mitchell
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2001

pdf bib
Issues in Extracting Information from the Web
William W. Cohen
Proceedings of the Seventh International Workshop on Parsing Technologies