Kelvin Han


2022

pdf
Generating Questions from Wikidata Triples
Kelvin Han | Thiago Castro Ferreira | Claire Gardent
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Question generation from knowledge bases (or knowledge base question generation, KBQG) is the task of generating questions from structured database information, typically in the form of triples representing facts. To handle rare entities and generalize to unseen properties, previous work on KBQG resorted to extensive, often ad-hoc pre- and post-processing of the input triple. We revisit KBQG – using pre training, a new (triple, question) dataset and taking question type into account – and show that our approach outperforms previous work both in a standard and in a zero-shot setting. We also show that the extended KBQG dataset (also helpful for knowledge base question answering) we provide allows not only for better coverage in terms of knowledge base (KB) properties but also for increased output variability in that it permits the generation of multiple questions from the same KB triple.

2020

pdf
Comparing PTB and UD information for PDTB discourseconnective identification
Kelvin Han | Phyllicia Leavitt | Srilakshmi Balard
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 3 : Rencontre des Étudiants Chercheurs en Informatique pour le TAL

Our work on the automatic detection of English discourse connectives in the Penn Discourse Treebank (PDTB) shows that syntactic information from the Universal Dependencies (UD) framework is a viable alternative to that from the Penn Treebank (PTB) framework. In fact, we found minor increases when comparing between the use of gold standard PTB part-of-speech (POS) tag information and automatically parsed UD information. The former has traditionally been used for the task but there are now much more UD corpora and in many more languages than that available in the PTB framework. As such, this finding is promising for areas in discourse parsing such as in multilingual as well as under production settings, where gold standard PTB information may be scarce.