“对话风格能够反映对话者的属性,例如情感、性别和教育背景等。在对话系统中,通过理解用户的对话风格,能够更好地对用户进行建模。同样的,面对不同背景的用户,对话机器人也应该使用不同的语言风格与之交流。语言表达风格是文本的内在属性,然而现有的大多数文本风格迁移研究,集中在英文领域,在中文领域则研究较少。本文构建了三个可用于中文文本风格迁移研究的数据集,并将多种已有的文本风格迁移方法应用于该数据集。同时,本文提出了基于DeepStyle算法与Transformer的风格迁移模型,通过预训练可以获得不同风格的隐层向量表示。并基于Transformer构建生成端模型,在解码阶段,通过重建源文本的方式,保留生成文本的内容信息,并且引入对立风格的嵌入表示,使得模型能够生成不同风格的文本。实验结果表明,本文提出的模型在构建的中文数据集上均优于现有模型。”
We cast a suite of information extraction tasks into a text-to-triple translation framework. Instead of solving each task relying on task-specific datasets and models, we formalize the task as a translation between task-specific input text and output triples. By taking the task-specific input, we enable a task-agnostic translation by leveraging the latent knowledge that a pre-trained language model has about the task. We further demonstrate that a simple pre-training task of predicting which relational information corresponds to which input text is an effective way to produce task-specific outputs. This enables the zero-shot transfer of our framework to downstream tasks. We study the zero-shot performance of this framework on open information extraction (OIE2016, NYT, WEB, PENN), relation classification (FewRel and TACRED), and factual probe (Google-RE and T-REx). The model transfers non-trivially to most tasks and is often competitive with a fully supervised method without the need for any task-specific training. For instance, we significantly outperform the F1 score of the supervised open information extraction without needing to use its training set.
Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction. In this paper, we design the positional dependency-based word embedding (PoD) which considers both dependency context and positional context for aspect term extraction. Specifically, the positional context is modeled via relative position encoding. Besides, we enhance the dependency context by integrating more lexical information (e.g., POS tags) along dependency paths. Experiments on SemEval 2014/2015/2016 datasets show that our approach outperforms other embedding methods in aspect term extraction.
Crowdsourcing has proven to be an effective method for generating labeled data for a range of NLP tasks. However, multiple recent attempts of using crowdsourcing to generate gold-labeled training data for semantic role labeling (SRL) reported only modest results, indicating that SRL is perhaps too difficult a task to be effectively crowdsourced. In this paper, we postulate that while producing SRL annotation does require expert involvement in general, a large subset of SRL labeling tasks is in fact appropriate for the crowd. We present a novel workflow in which we employ a classifier to identify difficult annotation tasks and route each task either to experts or crowd workers according to their difficulties. Our experimental evaluation shows that the proposed approach reduces the workload for experts by over two-thirds, and thus significantly reduces the cost of producing SRL annotation at little loss in quality.