Zachary Ives


2021

pdf
What is Your Article Based On? Inferring Fine-grained Provenance
Yi Zhang | Zachary Ives | Dan Roth
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

When evaluating an article and the claims it makes, a critical reader must be able to assess where the information presented comes from, and whether the various claims are mutually consistent and support the conclusion. This motivates the study of claim provenance, which seeks to trace and explain the origins of claims. In this paper, we introduce new techniques to model and reason about the provenance of multiple interacting claims, including how to capture fine-grained information about the context. Our solution hinges on first identifying the sentences that potentially contain important external information. We then develop a query generator with our novel rank-aware cross attention mechanism, which aims at generating metadata for the source article, based on the context and the signals collected from a search engine. This establishes relevant search queries, and it allows us to obtain source article candidates for each identified sentence and propose an ILP based algorithm to infer the best sources. We experiment with a newly created evaluation dataset, Politi-Prov, based on fact-checking articles from www.politifact.com; our experimental results show that our solution leads to a significant improvement over baselines.

2020

pdf
“Who said it, and Why?” Provenance for Natural Language Claims
Yi Zhang | Zachary Ives | Dan Roth
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In an era where generating content and publishing it is so easy, we are bombarded with information and are exposed to all kinds of claims, some of which do not always rank high on the truth scale. This paper suggests that the key to a longer-term, holistic, and systematic approach to navigating this information pollution is capturing the provenance of claims. To do that, we develop a formal definition of provenance graph for a given natural language claim, aiming to understand where the claim may come from and how it has evolved. To construct the graph, we model provenance inference, formulated mainly as an information extraction task and addressed via a textual entailment model. We evaluate our approach using two benchmark datasets, showing initial success in capturing the notion of provenance and its effectiveness on the application of claim verification.

2019

pdf
Evidence-based Trustworthiness
Yi Zhang | Zachary Ives | Dan Roth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The information revolution brought with it information pollution. Information retrieval and extraction help us cope with abundant information from diverse sources. But some sources are of anonymous authorship, and some are of uncertain accuracy, so how can we determine what we should actually believe? Not all information sources are equally trustworthy, and simply accepting the majority view is often wrong. This paper develops a general framework for estimating the trustworthiness of information sources in an environment where multiple sources provide claims and supporting evidence, and each claim can potentially be produced by multiple sources. We consider two settings: one in which information sources directly assert claims, and a more realistic and challenging one, in which claims are inferred from evidence provided by sources, via (possibly noisy) NLP techniques. Our key contribution is to develop a family of probabilistic models that jointly estimate the trustworthiness of sources, and the credibility of claims they assert. This is done while accounting for the (possibly noisy) NLP needed to infer claims from evidence supplied by sources. We evaluate our framework on several datasets, showing strong results and significant improvement over baselines.
Search
Co-authors
Venues