Neural-Guided Program Synthesis of Information Extraction Rules Using Self-Supervision
Enrique Noriega-Atala, Robert Vacareanu, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega
Abstract
We propose a neural-based approach for rule synthesis designed to help bridge the gap between the interpretability, precision and maintainability exhibited by rule-based information extraction systems with the scalability and convenience of statistical information extraction systems. This is achieved by avoiding placing the burden of learning another specialized language on domain experts and instead asking them to provide a small set of examples in the form of highlighted spans of text. We introduce a transformer-based architecture that drives a rule synthesis system that leverages a self-supervised approach for pre-training a large-scale language model complemented by an analysis of different loss functions and aggregation mechanisms for variable length sequences of user-annotated spans of text. The results are encouraging and point to different desirable properties, such as speed and quality, depending on the choice of loss and aggregation method.- Anthology ID:
- 2022.pandl-1.10
- Volume:
- Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Laura Chiticariu, Yoav Goldberg, Gus Hahn-Powell, Clayton T. Morrison, Aakanksha Naik, Rebecca Sharp, Mihai Surdeanu, Marco Valenzuela-Escárcega, Enrique Noriega-Atala
- Venue:
- PANDL
- SIG:
- Publisher:
- International Conference on Computational Linguistics
- Note:
- Pages:
- 85–93
- Language:
- URL:
- https://aclanthology.org/2022.pandl-1.10
- DOI:
- Cite (ACL):
- Enrique Noriega-Atala, Robert Vacareanu, Gus Hahn-Powell, and Marco A. Valenzuela-Escárcega. 2022. Neural-Guided Program Synthesis of Information Extraction Rules Using Self-Supervision. In Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning, pages 85–93, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
- Cite (Informal):
- Neural-Guided Program Synthesis of Information Extraction Rules Using Self-Supervision (Noriega-Atala et al., PANDL 2022)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2022.pandl-1.10.pdf