Xiaoying Gao


2021

pdf
基于迭代信息传递和滑动窗口注意力的问题生成模型研究(Question Generation Model Based on Iterative Message Passing and Sliding Windows Hierarchical Attention)
Qian Chen (陈千) | Xiaoying Gao (高晓影) | Suge Wang (王素格) | Xin Guo (郭鑫)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

知识图谱问题生成任务是从给定的知识图谱中生成与其相关的问题。目前,知识图谱问题生成模型主要使用基于RNN或Transformer对知识图谱子图进行编码,但这种方式丢失了显式的图结构化信息,在解码器中忽视了局部信息对节点的重要性。本文提出迭代信息传递图编码器来编码子图,获取子图显式的图结构化信息,此外,我们还使用滑动窗口注意力机制提高RNN解码器,提升子图局部信息对节点的重要度。从WQ和PQ数据集上的实验结果看,我们提出的模型比KTG模型在BLEU4指标上分别高出2.16和15.44,证明了该模型的有效性。

2020

pdf
In Data We Trust: A Critical Analysis of Hate Speech Detection Datasets
Kosisochukwu Madukwe | Xiaoying Gao | Bing Xue
Proceedings of the Fourth Workshop on Online Abuse and Harms

Recently, a few studies have discussed the limitations of datasets collected for the task of detecting hate speech from different viewpoints. We intend to contribute to the conversation by providing a consolidated overview of these issues pertaining to the data that debilitate research in this area. Specifically, we discuss how the varying pre-processing steps and the format for making data publicly available result in highly varying datasets that make an objective comparison between studies difficult and unfair. There is currently no study (to the best of our knowledge) focused on comparing the attributes of existing datasets for hate speech detection, outlining their limitations and recommending approaches for future research. This work intends to fill that gap and become the one-stop shop for information regarding hate speech datasets.