Joe A. Cecil


2024

pdf bib
Redacted Contextual Question Answering with Generative Large Language Models
Jacob Lichtefeld | Joe A. Cecil | Alex Hedges | Jeremy Abramson | Marjorie Freedman
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

Many contexts, such as medicine, finance, and cybersecurity, require controlled release of private or internal information. Traditionally, manually redacting sensitive information for release is an arduous and costly process, and while generative Large Language Models (gLLM) show promise at document-based ques- tion answering and summarization, their ability to do so while redacting sensitive information has not been widely explored. To address this, we introduce a new task, called redacted contextual question answering (RC-QA). This explores a gLLM’s ability to collaborate with a trusted user in a question-answer task as a proxy for drafting a public release informed by the redaction of potentially sensitive information, presented here in the form of constraints on the answers. We introduce a sample question-answer dataset for this task using publicly available data with four sample constraints. We present evaluation results for five language models and two refined models. Our results show that most models—especially open-source models—struggle to accurately answer questions under these constraints. We hope that these preliminary results help catalyze further exploration into this topic, and to that end, we make our code and data avail- able at https://github.com/isi-vista/ redacted-contextual-question-answering.