Ioana Manolescu


2023

pdf
Open Information Extraction with Entity Focused Constraints
Prajna Upadhyay | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: EACL 2023

Open Information Extraction (OIE) is the task of extracting tuples of the form (subject, predicate, object), without any knowledge of the type and lexical form of the predicate, the subject, or the object. In this work, we focus on improving OIE quality by exploiting domain knowledge about the subject and object. More precisely, knowing that the subjects and objects in sentences are oftentimes named entities, we explore how to inject constraints in the extraction through constrained inference and constraint-aware training. Our work leverages the state-of-the-art OpenIE6 platform, which we adapt to our setting. Through a carefully constructed training dataset and constrained training, we obtain a 29.17% F1-score improvement in the CaRB metric and a 24.37% F1-score improvement in the WIRe57 metric. Our technique has important applications – one of them is investigative journalism, where automatically extracting conflict-of-interest between scientists and funding organizations helps understand the type of relations companies engage with the scientists.

pdf
FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
Kun Zhang | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: EMNLP 2023

Graph-to-text (G2T) generation takes a graph as input and aims to generate a fluent and faith- ful textual representation of the information in the graph. The task has many applications, such as dialogue generation and question an- swering. In this work, we investigate to what extent the G2T generation problem is solved for previously studied datasets, and how pro- posed metrics perform when comparing generated texts. To help address their limitations, we propose a new metric that correctly identifies factual faithfulness, i.e., given a triple (subject, predicate, object), it decides if the triple is present in a generated text. We show that our metric FactSpotter achieves the highest correlation with human annotations on data correct- ness, data coverage, and relevance. In addition, FactSpotter can be used as a plug-in feature to improve the factual faithfulness of existing models. Finally, we investigate if existing G2T datasets are still challenging for state-of-the-art models. Our code is available online: https://github.com/guihuzhang/FactSpotter.