Junfeng Hu


Incorporate Semantic Structures into Machine Translation Evaluation via UCCA
Jin Xu | Yinuo Guo | Junfeng Hu
Proceedings of the Fifth Conference on Machine Translation

Copying mechanism has been commonly used in neural paraphrasing networks and other text generation tasks, in which some important words in the input sequence are preserved in the output sequence. Similarly, in machine translation, we notice that there are certain words or phrases appearing in all good translations of one source text, and these words tend to convey important semantic information. Therefore, in this work, we define words carrying important semantic meanings in sentences as semantic core words. Moreover, we propose an MT evaluation approach named Semantically Weighted Sentence Similarity (SWSS). It leverages the power of UCCA to identify semantic core words, and then calculates sentence similarity scores on the overlap of semantic core words. Experimental results show that SWSS can consistently improve the performance of popular MT evaluation metrics which are based on lexical similarity.


Variational Semi-Supervised Aspect-Term Sentiment Analysis via Transformer
Xingyi Cheng | Weidi Xu | Taifeng Wang | Wei Chu | Weipeng Huang | Kunlong Chen | Junfeng Hu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Aspect-term sentiment analysis (ATSA) is a long-standing challenge in natural language process. It requires fine-grained semantical reasoning about a target entity appeared in the text. As manual annotation over the aspects is laborious and time-consuming, the amount of labeled data is limited for supervised learning. This paper proposes a semi-supervised method for the ATSA problem by using the Variational Autoencoder based on Transformer. The model learns the latent distribution via variational inference. By disentangling the latent representation into the aspect-specific sentiment and the lexical context, our method induces the underlying sentiment prediction for the unlabeled data, which then benefits the ATSA classifier. Our method is classifier-agnostic, i.e., the classifier is an independent module and various supervised models can be integrated. Experimental results are obtained on the SemEval 2014 task 4 and show that our method is effective with different the five specific classifiers and outperforms these models by a significant margin.

Meteor++ 2.0: Adopt Syntactic Level Paraphrase Knowledge into Machine Translation Evaluation
Yinuo Guo | Junfeng Hu
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes Meteor++ 2.0, our submission to the WMT19 Metric Shared Task. The well known Meteor metric improves machine translation evaluation by introducing paraphrase knowledge. However, it only focuses on the lexical level and utilizes consecutive n-grams paraphrases. In this work, we take into consideration syntactic level paraphrase knowledge, which sometimes may be skip-grams. We describe how such knowledge can be extracted from Paraphrase Database (PPDB) and integrated into Meteor-based metrics. Experiments on WMT15 and WMT17 evaluation datasets show that the newly proposed metric outperforms all previous versions of Meteor.


Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense
Haoyue Shi | Xihao Wang | Yuqi Sun | Junfeng Hu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Implicit Subjective and Sentimental Usages in Multi-sense Word Embeddings
Yuqi Sun | Haoyue Shi | Junfeng Hu
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In multi-sense word embeddings, contextual variations in corpus may cause a univocal word to be embedded into different sense vectors. Shi et al. (2016) show that this kind of pseudo multi-senses can be eliminated by linear transformations. In this paper, we show that pseudo multi-senses may come from a uniform and meaningful phenomenon such as subjective and sentimental usage, though they are seemingly redundant. In this paper, we present an unsupervised algorithm to find a linear transformation which can minimize the transformed distance of a group of sense pairs. The major shrinking direction of this transformation is found to be related with subjective shift. Therefore, we can not only eliminate pseudo multi-senses in multisense embeddings, but also identify these subjective senses and tag the subjective and sentimental usage of words in the corpus automatically.

Meteor++: Incorporating Copy Knowledge into Machine Translation Evaluation
Yinuo Guo | Chong Ruan | Junfeng Hu
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

In machine translation evaluation, a good candidate translation can be regarded as a paraphrase of the reference. We notice that some words are always copied during paraphrasing, which we call copy knowledge. Considering the stability of such knowledge, a good candidate translation should contain all these words appeared in the reference sentence. Therefore, in this participation of the WMT’2018 metrics shared task we introduce a simple statistical method for copy knowledge extraction, and incorporate it into Meteor metric, resulting in a new machine translation metric Meteor++. Our experiments show that Meteor++ can nicely integrate copy knowledge and improve the performance significantly on WMT17 and WMT15 evaluation sets.


Domain Ontology Learning Enhanced by Optimized Relation Instance in DBpedia
Liumingjing Xiao | Chong Ruan | An Yang | Junhao Zhang | Junfeng Hu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Ontologies are powerful to support semantic based applications and intelligent systems. While ontology learning are challenging due to its bottleneck in handcrafting structured knowledge sources and training data. To address this difficulty, many researchers turn to ontology enrichment and population using external knowledge sources such as DBpedia. In this paper, we propose a method using DBpedia in a different manner. We utilize relation instances in DBpedia to supervise the ontology learning procedure from unstructured text, rather than populate the ontology structure as a post-processing step. We construct three language resources in areas of computer science: enriched Wikipedia concept tree, domain ontology, and gold standard from NSFC taxonomy. Experiment shows that the result of ontology learning from corpus of computer science can be improved via the relation instances extracted from DBpedia in the same field. Furthermore, making distinction between the relation instances and applying a proper weighting scheme in the learning procedure lead to even better result.

Real Multi-Sense or Pseudo Multi-Sense: An Approach to Improve Word Representation
Haoyue Shi | Caihua Li | Junfeng Hu
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Previous researches have shown that learning multiple representations for polysemous words can improve the performance of word embeddings on many tasks. However, this leads to another problem. Several vectors of a word may actually point to the same meaning, namely pseudo multi-sense. In this paper, we introduce the concept of pseudo multi-sense, and then propose an algorithm to detect such cases. With the consideration of the detected pseudo multi-sense cases, we try to refine the existing word embeddings to eliminate the influence of pseudo multi-sense. Moreover, we apply our algorithm on previous released multi-sense word embeddings and tested it on artificial word similarity tasks and the analogy task. The result of the experiments shows that diminishing pseudo multi-sense can improve the quality of word representations. Thus, our method is actually an efficient way to reduce linguistic complexity.


Construction of Diachronic Ontologies from People’s Daily of Fifty Years
Shaoda He | Xiaojun Zou | Liumingjing Xiao | Junfeng Hu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents an Ontology Learning From Text (OLFT) method follows the well-known OLFT cake layer framework. Based on the distributional similarity, the proposed method generates multi-level ontologies from comparatively small corpora with the aid of HITS algorithm. Currently, this method covers terms extraction, synonyms recognition, concepts discovery and concepts hierarchical clustering. Among them, both concepts discovery and concepts hierarchical clustering are aided by the HITS authority, which is obtained from the HITS algorithm by an iteratively recommended way. With this method, a set of diachronic ontologies is constructed for each year based on People’s Daily corpora of fifty years (i.e., from 1947 to 1996). Preliminary experiments show that our algorithm outperforms the Google’s RNN and K-means based algorithm in both concepts discovery and concepts hierarchical clustering.


Human-Computer Interactive Chinese Word Segmentation: An Adaptive Dirichlet Process Mixture Model Approach
Tongfei Chen | Xiaojun Zou | Weimeng Zhu | Junfeng Hu
Proceedings of the Sixth International Joint Conference on Natural Language Processing


Clustering Technique in Multi-Document Personal Name Disambiguation
Chen Chen | Junfeng Hu | Houfeng Wang
Proceedings of the ACL-IJCNLP 2009 Student Research Workshop


From Text to Exhibitions: A New Approach for E-Learning on Language and Literature based on Text Mining
Qiaozhu Mei | Junfeng Hu
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning


The Multi-layer Language Knowledge Base of Chinese NLP
Junfeng Hu | Shiwen Yu
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)