2022
pdf
bib
abs
Chinese Movie Dialogue Question Answering Dataset
Shang-Bao Luo
|
Cheng-Chung Fan
|
Kuan-Yu Chen
|
Yu Tsao
|
Hsin-Min Wang
|
Keh-Yih Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
This paper constructs a Chinese dialogue-based information-seeking question answering dataset CMDQA, which is mainly applied to the scenario of getting Chinese movie related information. It contains 10K QA dialogs (40K turns in total). All questions and background documents are compiled from the Wikipedia via an Internet crawler. The answers to the questions are obtained via extracting the corresponding answer spans within the related text passage. In CMDQA, in addition to searching related documents, pronouns are also added to the question to better mimic the real dialog scenario. This dataset can test the individual performance of the information retrieval, the question answering and the question re-writing modules. This paper also provides a baseline system and shows its performance on this dataset. The experiments elucidate that it still has a big gap to catch the human performance. This dataset thus provides enough challenge for the researcher to conduct related research.
2021
pdf
abs
A Flexible and Extensible Framework for Multiple Answer Modes Question Answering
Cheng-Chung Fan
|
Chia-Chih Kuo
|
Shang-Bao Luo
|
Pei-Jun Liao
|
Kuang-Yu Chang
|
Chiao-Wei Hsu
|
Meng-Tse Wu
|
Shih-Hong Tsai
|
Tzu-Man Wu
|
Aleksandra Smolka
|
Chao-Chun Liang
|
Hsin-Min Wang
|
Kuan-Yu Chen
|
Yu Tsao
|
Keh-Yih Su
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
This paper presents a framework to answer the questions that require various kinds of inference mechanisms (such as Extraction, Entailment-Judgement, and Summarization). Most of the previous approaches adopt a rigid framework which handles only one inference mechanism. Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks. To alleviate the problems mentioned above, we propose a divide-and-conquer framework, which consists of a set of various answer generation modules, a dispatch module, and an aggregation module. The answer generation modules are designed to provide different inference mechanisms, the dispatch module is used to select a few appropriate answer generation modules to generate answer candidates, and the aggregation module is employed to select the final answer. We test our framework on the 2020 Formosa Grand Challenge Contest dataset. Experiments show that the proposed framework outperforms the state-of-the-art Roberta-large model by about 11.4%.
2019
pdf
bib
基於特徵粒度之訓練策略於中文口語問答系統之應用(A Feature-granularity Training Strategy for Chinese Spoken Question Answering)
Shang-Bao Luo
|
Kuan-Yu Chen
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)
pdf
bib
基於特徵粒度之訓練策略於中文口語問答系統之應用 (A Feature-granularity Training Strategy for Chinese Spoken Question Answering)
Shang-Bao Luo
|
Kuan-Yu Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 24, Number 2, December 2019
2018
pdf
未登錄詞之向量表示法模型於中文機器閱讀理解之應用 (An OOV Word Embedding Framework for Chinese Machine Reading Comprehension)
Shang-Bao Luo
|
Ching-Hsien Lee
|
Jia-Jang Tu
|
Kuan-Yu Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 23, Number 2, December 2018
pdf
未登錄詞之向量表示法模型於中文機器閱讀理解之應用 (An OOV Word Embedding Framework for Chinese Machine Reading Comprehension) [In Chinese]
Shang-Bao Luo
|
Ching-Hsien Lee
|
Kuan-Yu Chen
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing (ROCLING 2018)