Describing a Knowledge Base

Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, Kevin Knight


Abstract
We aim to automatically generate natural language descriptions about an input structured knowledge base (KB). We build our generation framework based on a pointer network which can copy facts from the input KB, and add two attention mechanisms: (i) slot-aware attention to capture the association between a slot type and its corresponding slot value; and (ii) a new table position self-attention to capture the inter-dependencies among related slots. For evaluation, besides standard metrics including BLEU, METEOR, and ROUGE, we propose a KB reconstruction based metric by extracting a KB from the generation output and comparing it with the input KB. We also create a new data set which includes 106,216 pairs of structured KBs and their corresponding natural language descriptions for two distinct entity types. Experiments show that our approach significantly outperforms state-of-the-art methods. The reconstructed KB achieves 68.8% - 72.6% F-score.
Anthology ID:
W18-6502
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–21
Language:
URL:
https://aclanthology.org/W18-6502
DOI:
10.18653/v1/W18-6502
Bibkey:
Cite (ACL):
Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, and Kevin Knight. 2018. Describing a Knowledge Base. In Proceedings of the 11th International Conference on Natural Language Generation, pages 10–21, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
Describing a Knowledge Base (Wang et al., INLG 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/W18-6502.pdf
Code
 EagleW/Describing_a_Knowledge_Base
Data
Wikipedia Person and Animal Dataset