Tianyong Hao

2022

pdf abs
A Self-supervised Joint Training Framework for Document Reranking
Xiaozhi Zhu | Tianyong Hao | Sijie Cheng | Fu Lee Wang | Hai Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Pretrained language models such as BERT have been successfully applied to a wide range of natural language processing tasks and also achieved impressive performance in document reranking tasks. Recent works indicate that further pretraining the language models on the task-specific datasets before fine-tuning helps improve reranking performance. However, the pre-training tasks like masked language model and next sentence prediction were based on the context of documents instead of encouraging the model to understand the content of queries in document reranking task. In this paper, we propose a new self-supervised joint training framework (SJTF) with a self-supervised method called Masked Query Prediction (MQP) to establish semantic relations between given queries and positive documents. The framework randomly masks a token of query and encodes the masked query paired with positive documents, and uses a linear layer as a decoder to predict the masked token. In addition, the MQP is used to jointly optimize the models with supervised ranking objective during fine-tuning stage without an extra further pre-training stage. Extensive experiments on the MS MARCO passage ranking and TREC Robust datasets show that models trained with our framework obtain significant improvements compared to original models.

pdf abs
中文糖尿病问题分类体系及标注语料库构建研究(The Construction of Question Taxonomy and An Annotated Chinese Corpus for Diabetes Question Classification)
Xiaobo Qian (钱晓波) | Wenxiu Xie (谢文秀) | Shaopei Long (龙绍沛) | Murong Lan (兰牧融) | Yuanyuan Mu (慕媛媛) | Tianyong Hao (郝天永)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“糖尿病作为一种典型慢性疾病已成为全球重大公共卫生挑战之一。随着互联网的快速发展,庞大的二型糖尿病患者和高危人群对糖尿病专业信息获取的需求日益突出,糖尿病自动问答服务对患者和高危人群的日常健康服务也发挥着越来越重要的作用,然而存在缺乏细粒度分类等突出问题。本文设计了一个表示用户意图的新型糖尿病问题分类体系,包括6个大类和23个细类。基于该体系,本文从两个专业医疗问答网站爬取并构建了一个包含122732个问答对的中文糖尿病问答语料库DaCorp,同时对其中的8000个糖尿病问题进行人工标注,形成一个细粒度的糖尿病标注数据集。此外,为评估该标注数据集的质量,本文实现了8个主流基线分类模型。实验结果表明,最佳分类模型的准确率达到88.7%,验证了糖尿病标注数据集及所提分类体系的有效性。Dacorp、糖尿病标注数据集和标注指南已在线发布,可以免费用于学术研究。”

2018

pdf abs
T-Know: a Knowledge Graph-based Question Answering and Infor-mation Retrieval System for Traditional Chinese Medicine
Ziqing Liu | Enwei Peng | Shixing Yan | Guozheng Li | Tianyong Hao
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

T-Know is a knowledge service system based on the constructed knowledge graph of Traditional Chinese Medicine (TCM). Using authorized and anonymized clinical records, medicine clinical guidelines, teaching materials, classic medical books, academic publications, etc., as data resources, the system extracts triples from free texts to build a TCM knowledge graph by our developed natural language processing methods. On the basis of the knowledge graph, a deep learning algorithm is implemented for single-round question understanding and multiple-round dialogue. In addition, the TCM knowledge graph also is used to support human-computer interactive knowledge retrieval by normalizing search keywords to medical terminology.

pdf
Annotating Measurable Quantitative Informationin Language: for an ISO Standard
Tianyong Hao | Haotai Wang | Xinyu Cao | Kiyong Lee
Proceedings 14th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

Tianyong Hao

2022

2018

2017

2010

Co-authors

Venues