Donghong Ji

Also published as: Dong Hong Ji, Dong-Hong Ji, DongHong Ji

2021

pdf bib
Better Combine Them Together! Integrating Syntactic Constituency and Dependency Representations for Semantic Role Labeling
Hao Fei | Shengqiong Wu | Yafeng Ren | Fei Li | Donghong Ji
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition
Fei Li | ZhiChao Lin | Meishan Zhang | Donghong Ji
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Research on overlapped and discontinuous named entity recognition (NER) has received increasing attention. The majority of previous work focuses on either overlapped or discontinuous entities. In this paper, we propose a novel span-based model that can recognize both overlapped and discontinuous entities jointly. The model includes two major steps. First, entity fragments are recognized by traversing over all possible text spans, thus, overlapped entities can be recognized. Second, we perform relation classification to judge whether a given pair of entity fragments to be overlapping or succession. In this way, we can recognize not only discontinuous entities, and meanwhile doubly check the overlapped entities. As a whole, our model can be regarded as a relation extraction paradigm essentially. Experimental results on multiple benchmark datasets (i.e., CLEF, GENIA and ACE05) show that our model is highly competitive for overlapped and discontinuous NER.

2020

pdf bib abs
High-order Refining for End-to-end Chinese Semantic Role Labeling
Hao Fei | Yafeng Ren | Donghong Ji
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Current end-to-end semantic role labeling is mostly accomplished via graph-based neural models. However, these all are first-order models, where each decision for detecting any predicate-argument pair is made in isolation with local features. In this paper, we present a high-order refining mechanism to perform interaction between all predicate-argument pairs. Based on the baseline graph model, our high-order refining module learns higher-order features between all candidate pairs via attention calculation, which are later used to update the original token representations. After several iterations of refinement, the underlying token representations can be enriched with globally interacted features. Our high-order model achieves state-of-the-art results on Chinese SRL data, including CoNLL09 and Universal Proposition Bank, meanwhile relieving the long-range dependency issues.

pdf bib abs
Retrofitting Structure-aware Transformer Language Model for End Tasks
Hao Fei | Yafeng Ren | Donghong Ji
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We consider retrofitting structure-aware Transformer language model for facilitating end tasks by proposing to exploit syntactic distance to encode both the phrasal constituency and dependency connection into the language model. A middle-layer structural learning strategy is leveraged for structure integration, accomplished with main semantic task training under multi-task learning scheme. Experimental results show that the retrofitted structure-aware Transformer language model achieves improved perplexity, meanwhile inducing accurate syntactic phrases. By performing structure-aware fine-tuning, our model achieves significant improvements for both semantic- and syntactic-dependent tasks.

pdf bib abs
Improving Text Understanding via Deep Syntax-Semantics Communication
Hao Fei | Yafeng Ren | Donghong Ji
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies show that integrating syntactic tree models with sequential semantic models can bring improved task performance, while these methods mostly employ shallow integration of syntax and semantics. In this paper, we propose a deep neural communication model between syntax and semantics to improve the performance of text understanding. Local communication is performed between syntactic tree encoder and sequential semantic encoder for mutual learning of information exchange. Global communication can further ensure comprehensive information propagation. Results on multiple syntax-dependent tasks show that our model outperforms strong baselines by a large margin. In-depth analysis indicates that our method is highly effective in composing sentence semantics.

pdf bib abs
Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP
Hao Fei | Yafeng Ren | Donghong Ji
Findings of the Association for Computational Linguistics: EMNLP 2020

Syntax has been shown useful for various NLP tasks, while existing work mostly encodes singleton syntactic tree using one hierarchical neural network. In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.

pdf bib abs
Modeling Local Contexts for Joint Dialogue Act Recognition and Sentiment Classification with Bi-channel Dynamic Convolutions
Jingye Li | Hao Fei | Donghong Ji
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we target improving the joint dialogue act recognition (DAR) and sentiment classification (SC) tasks by fully modeling the local contexts of utterances. First, we employ the dynamic convolution network (DCN) as the utterance encoder to capture the dialogue contexts. Further, we propose a novel context-aware dynamic convolution network (CDCN) to better leverage the local contexts when dynamically generating kernels. We extended our frameworks into bi-channel version (i.e., BDCN and BCDCN) under multi-task learning to achieve the joint DAR and SC. Two channels can learn their own feature representations for DAR and SC, respectively, but with latent interaction. Besides, we suggest enhancing the tasks by employing the DiaBERT language model. Our frameworks obtain state-of-the-art performances against all baselines on two benchmark datasets, demonstrating the importance of modeling the local contexts.

pdf bib abs
End to End Chinese Lexical Fusion Recognition with Sememe Knowledge
Yijiang Liu | Meishan Zhang | Donghong Ji
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we present Chinese lexical fusion recognition, a new task which could be regarded as one kind of coreference recognition. First, we introduce the task in detail, showing the relationship with coreference recognition and differences from the existing tasks. Second, we propose an end-to-end model for the task, handling mentions as well as coreference relationship jointly. The model exploits the state-of-the-art contextualized BERT representations as an encoder, and is further enhanced with the sememe knowledge from HowNet by graph attention networks. We manually annotate a benchmark dataset for the task and then conduct experiments on it. Results demonstrate that our final model is effective and competitive for the task. Detailed analysis is offered for comprehensively understanding the new task and our proposed model.

pdf bib abs
HiTrans: A Transformer-Based Context- and Speaker-Sensitive Model for Emotion Detection in Conversations
Jingye Li | Donghong Ji | Fei Li | Meishan Zhang | Yijiang Liu
Proceedings of the 28th International Conference on Computational Linguistics

Emotion detection in conversations (EDC) is to detect the emotion for each utterance in conversations that have multiple speakers. Different from the traditional non-conversational emotion detection, the model for EDC should be context-sensitive (e.g., understanding the whole conversation rather than one utterance) and speaker-sensitive (e.g., understanding which utterance belongs to which speaker). In this paper, we propose a transformer-based context- and speaker-sensitive model for EDC, namely HiTrans, which consists of two hierarchical transformers. We utilize BERT as the low-level transformer to generate local utterance representations, and feed them into another high-level transformer so that utterance representations could be sensitive to the global context of the conversation. Moreover, we exploit an auxiliary task to make our model speaker-sensitive, called pairwise utterance speaker verification (PUSV), which aims to classify whether two utterances belong to the same speaker. We evaluate our model on three benchmark datasets, namely EmoryNLP, MELD and IEMOCAP. Results show that our model outperforms previous state-of-the-art models.

pdf bib abs
AMR Parsing with Latent Structural Information
Qiji Zhou | Yue Zhang | Donghong Ji | Hao Tang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Abstract Meaning Representations (AMRs) capture sentence-level semantics structural representations to broad-coverage natural sentences. We investigate parsing AMR with explicit dependency structures and interpretable latent structures. We generate the latent soft structure without additional annotations, and fuse both dependency and latent structure via an extended graph neural networks. The fused structural information helps our experiments results to achieve the best reported results on both AMR 2.0 (77.5% Smatch F1 on LDC2017T10) and AMR 1.0 ((71.8% Smatch F1 on LDC2014T12).

pdf bib abs
Dependency Graph Enhanced Dual-transformer Structure for Aspect-based Sentiment Classification
Hao Tang | Donghong Ji | Chenliang Li | Qiji Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Aspect-based sentiment classification is a popular task aimed at identifying the corresponding emotion of a specific aspect. One sentence may contain various sentiments for different aspects. Many sophisticated methods such as attention mechanism and Convolutional Neural Networks (CNN) have been widely employed for handling this challenge. Recently, semantic dependency tree implemented by Graph Convolutional Networks (GCN) is introduced to describe the inner connection between aspects and the associated emotion words. But the improvement is limited due to the noise and instability of dependency trees. To this end, we propose a dependency graph enhanced dual-transformer network (named DGEDT) by jointly considering the flat representations learnt from Transformer and graph-based representations learnt from the corresponding dependency graph in an iterative interaction manner. Specifically, a dual-transformer structure is devised in DGEDT to support mutual reinforcement between the flat representation learning and graph-based representation learning. The idea is to allow the dependency graph to guide the representation learning of the transformer encoder and vice versa. The results on five datasets demonstrate that the proposed DGEDT outperforms all state-of-the-art alternatives with a large margin.

pdf bib abs
Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus
Hao Fei | Meishan Zhang | Donghong Ji
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.

2019

pdf bib abs
Bacteria Biotope Relation Extraction via Lexical Chains and Dependency Graphs
Wuti Xiong | Fei Li | Ming Cheng | Hong Yu | Donghong Ji
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

abstract In this article, we describe our approach for the Bacteria Biotopes relation extraction (BB-rel) subtask in the BioNLP Shared Task 2019. This task aims to promote the development of text mining systems that extract relationships between Microorganism, Habitat and Phenotype entities. In this paper, we propose a novel approach for dependency graph construction based on lexical chains, so one dependency graph can represent one or multiple sentences. After that, we propose a neural network model which consists of the bidirectional long short-term memories and an attention graph convolution neural network to learn relation extraction features from the graph. Our approach is able to extract both intra- and inter-sentence relations, and meanwhile utilize syntax information. The results show that our approach achieved the best F1 (66.3%) in the official evaluation participated by 7 teams.

2016

pdf bib abs
Distance Metric Learning for Aspect Phrase Grouping
Shufeng Xiong | Yue Zhang | Donghong Ji | Yinxia Lou
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Aspect phrase grouping is an important task in aspect-level sentiment analysis. It is a challenging problem due to polysemy and context dependency. We propose an Attention-based Deep Distance Metric Learning (ADDML) method, by considering aspect phrase representation as well as context representation. First, leveraging the characteristics of the review text, we automatically generate aspect phrase sample pairs for distant supervision. Second, we feed word embeddings of aspect phrases and their contexts into an attention-based neural network to learn feature representation of contexts. Both aspect phrase embedding and context embedding are used to learn a deep feature subspace for measure the distances between aspect phrases for K-means clustering. Experiments on four review datasets show that the proposed method outperforms state-of-the-art strong baseline methods.

pdf bib abs
Multi-prototype Chinese Character Embedding
Yanan Lu | Yue Zhang | Donghong Ji
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-prototype embeddings are significantly better than a single-prototype baseline. In addition, used as features in the Chinese NER task, the embeddings result in a 1.74% F-score improvement over a state-of-the-art baseline.

pdf bib
WHUNlp at SemEval-2016 Task DiMSUM: A Pilot Study in Detecting Minimal Semantic Units and their Meanings using Supervised Models
Xin Tang | Fei Li | Donghong Ji
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

Catchwords refer to popular words or phrases within certain area in certain period of time. In this paper, we propose a novel approach for automatic Chinese catchwords extraction. At the beginning, we discuss the linguistic definition of catchwords and analyze the features of catchwords by manual evaluation. According to those features of catchwords, we define three aspects to describe Popular Degree of catchwords. To extract terms with maximum meaning, we adopt an effective ATE algorithm for multi-character words and long phrases. Then we use conic fitting in Time Series Analysis to build Popular Degree Curves of extracted terms. To calculate Popular Degree Values of catchwords, a formula is proposed which includes values of Popular Trend, Peak Value and Popular Keeping. Finally, a ranking list of catchword candidates is built according to Popular Degree Values. Experiments show that automatic Chinese catchword extraction is effective and objective in comparison with manual evaluation.

pdf bib
Automatic Chinese Catchword Extraction Based on Time Series Analysis
Han Ren | Donghong Ji | Jing Wan | Lei Han
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Sentence Ordering based on Cluster Adjacency in Multi-Document Summarization
Donghong Ji | Yu Nie
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II