Anton Belyy


2023

pdf
FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
Farima Fatahi Bayat | Kun Qian | Benjamin Han | Yisi Sang | Anton Belyy | Samira Khorshidi | Fei Wu | Ihab Ilyas | Yunyao Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Detecting factual errors of textual information, whether generated by large language models (LLM) or curated by humans, is crucial for making informed decisions. LLMs’ inability to attribute their claims to external knowledge and their tendency to hallucinate makes it difficult to rely on their responses. Humans, too, are prone to factual errors in their writing. Since manual detection and correction of factual er- rors is labor-intensive, developing an automatic approach can greatly reduce human effort. We present a prototype tool that automatically extracts factual claims from text, gathers evidence from external knowledge sources, evaluates the factuality of each claim, and suggests revisions for identified errors using the collected evidence. Initial empirical evaluation on fact error detection (77-85% F1) shows the potential of our tool.

2022

pdf
Guided K-best Selection for Semantic Parsing Annotation
Anton Belyy | Chieh-yang Huang | Jacob Andreas | Emmanouil Antonios Platanios | Sam Thomson | Richard Shin | Subhro Roy | Aleksandr Nisnevich | Charles Chen | Benjamin Van Durme
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Collecting data for conversational semantic parsing is a time-consuming and demanding process. In this paper we consider, given an incomplete dataset with only a small amount of data, how to build an AI-powered human-in-the-loop process to enable efficient data collection. A guided K-best selection process is proposed, which (i) generates a set of possible valid candidates; (ii) allows users to quickly traverse the set and filter incorrect parses; and (iii) asks users to select the correct parse, with minimal modification when necessary. We investigate how to best support users in efficiently traversing the candidate set and locating the correct parse, in terms of speed and accuracy. In our user study, consisting of five annotators labeling 300 instances each, we find that combining keyword searching, where keywords can be used to query relevant candidates, and keyword suggestion, where representative keywords are automatically generated, enables fast and accurate annotation.

pdf
Human Schema Curation via Causal Association Rule Mining
Noah Weber | Anton Belyy | Nils Holzenberger | Rachel Rudinger | Benjamin Van Durme
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

Event schemas are structured knowledge sources defining typical real-world scenarios (e.g., going to an airport). We present a framework for efficient human-in-the-loop construction of a schema library, based on a novel script induction system and a well-crafted interface that allows non-experts to “program” complex event structures. Associated with this work we release a schema library: a machine readable resource of 232 detailed event schemas, each of which describe a distinct typical scenario in terms of its relevant sub-event structure (what happens in the scenario), participants (who plays a role in the scenario), fine-grained typing of each participant, and the implied relational constraints between them. We make our schema library and the SchemaBlocks interface available online.

2021

pdf
InFillmore: Frame-Guided Language Generation with Bidirectional Context
Jiefu Ou | Nathaniel Weir | Anton Belyy | Felix Yu | Benjamin Van Durme
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

We propose a structured extension to bidirectional-context conditional language generation, or “infilling,” inspired by Frame Semantic theory. Guidance is provided through one of two approaches: (1) model fine-tuning, conditioning directly on observed symbolic frames, and (2) a novel extension to disjunctive lexically constrained decoding that leverages frame semantic lexical units. Automatic and human evaluations confirm that frame-guided generation allows for explicit manipulation of intended infill semantics, with minimal loss in distinguishability from human-generated text. Our methods flexibly apply to a variety of use scenarios, and we provide an interactive web demo.

2020

pdf
Script Induction as Association Rule Mining
Anton Belyy | Benjamin Van Durme
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

We show that the count-based Script Induction models of Chambers and Jurafsky (2008) and Jans et al. (2012) can be unified in a general framework of narrative chain likelihood maximization. We provide efficient algorithms based on Association Rule Mining (ARM) and weighted set cover that can discover interesting patterns in the training data and combine them in a reliable and explainable way to predict the missing event. The proposed method, unlike the prior work, does not assume full conditional independence and makes use of higher-order count statistics. We perform the ablation study and conclude that the inductive biases introduced by ARM are conducive to better performance on the narrative cloze test.

2018

pdf
Improved Evaluation Framework for Complex Plagiarism Detection
Anton Belyy | Marina Dubova | Dmitry Nekrasov
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Plagiarism is a major issue in science and education. Complex plagiarism, such as plagiarism of ideas, is hard to detect, and therefore it is especially important to track improvement of methods correctly. In this paper, we study the performance of plagdet, the main measure for plagiarim detection, on manually paraphrased datasets (such as PAN Summary). We reveal its fallibility under certain conditions and propose an evaluation framework with normalization of inner terms, which is resilient to the dataset imbalance. We conclude with the experimental justification of the proposed measure. The implementation of the new framework is made publicly available as a Github repository.