Redacted Contextual Question Answering with Generative Large Language Models

Jacob Lichtefeld, Joe A. Cecil, Alex Hedges, Jeremy Abramson, Marjorie Freedman


Abstract
Many contexts, such as medicine, finance, and cybersecurity, require controlled release of private or internal information. Traditionally, manually redacting sensitive information for release is an arduous and costly process, and while generative Large Language Models (gLLM) show promise at document-based ques- tion answering and summarization, their ability to do so while redacting sensitive information has not been widely explored. To address this, we introduce a new task, called redacted contextual question answering (RC-QA). This explores a gLLM’s ability to collaborate with a trusted user in a question-answer task as a proxy for drafting a public release informed by the redaction of potentially sensitive information, presented here in the form of constraints on the answers. We introduce a sample question-answer dataset for this task using publicly available data with four sample constraints. We present evaluation results for five language models and two refined models. Our results show that most models—especially open-source models—struggle to accurately answer questions under these constraints. We hope that these preliminary results help catalyze further exploration into this topic, and to that end, we make our code and data avail- able at https://github.com/isi-vista/ redacted-contextual-question-answering.
Anthology ID:
2024.nlpaics-1.25
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
230–237
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.25/
DOI:
Bibkey:
Cite (ACL):
Jacob Lichtefeld, Joe A. Cecil, Alex Hedges, Jeremy Abramson, and Marjorie Freedman. 2024. Redacted Contextual Question Answering with Generative Large Language Models. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 230–237, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
Redacted Contextual Question Answering with Generative Large Language Models (Lichtefeld et al., NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.25.pdf
Optionalsupplementarymaterial:
 2024.nlpaics-1.25.OptionalSupplementaryMaterial.zip