Renfen Hu


2021

pdf
古汉语词义标注语料库的构建及应用研究(The Construction and Application of Ancient Chinese Corpus with Word Sense Annotation)
Lei Shu (舒蕾) | Yiluan Guo (郭懿鸾) | Huiping Wang (王慧萍) | Xuetao Zhang (张学涛) | Renfen Hu (胡韧奋)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

古汉语以单音节词为主,其一词多义现象十分突出,这为现代人理解古文含义带来了一定的挑战。为了更好地实现古汉语词义的分析和判别,本研究基于传统辞书和语料库反映的语言事实,设计了针对古汉语多义词的词义划分原则,并对常用古汉语单音节词进行词义级别的知识整理,据此对包含多义词的语料开展词义标注。现有的语料库包含3.87万条标注数据,规模超过117.6万字,丰富了古代汉语领域的语言资源。实验显示,基于该语料库和BERT语言模型,词义判别算法准确率达到80%左右。进一步地,本文以词义历时演变分析和义族归纳为案例,初步探索了语料库与词义消歧技术在语言本体研究和词典编撰等领域的应用。

2019

pdf
An Intelligent Testing Strategy for Vocabulary Assessment of Chinese Second Language Learners
Wei Zhou | Renfen Hu | Feipeng Sun | Ronghuai Huang
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Vocabulary is one of the most important parts of language competence. Testing of vocabulary knowledge is central to research on reading and language. However, it usually costs a large amount of time and human labor to build an item bank and to test large number of students. In this paper, we propose a novel testing strategy by combining automatic item generation (AIG) and computerized adaptive testing (CAT) in vocabulary assessment for Chinese L2 learners. Firstly, we generate three types of vocabulary questions by modeling both the vocabulary knowledge and learners’ writing error data. After evaluation and calibration, we construct a balanced item pool with automatically generated items, and implement a three-parameter computerized adaptive test. We conduct manual item evaluation and online student tests in the experiments. The results show that the combination of AIG and CAT can construct test items efficiently and reduce test cost significantly. Also, the test result of CAT can provide valuable feedback to AIG algorithms.

pdf
Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View
Renfen Hu | Shen Li | Shichen Liang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Diachronic word embeddings have been widely used in detecting temporal changes. However, existing methods face the meaning conflation deficiency by representing a word as a single vector at each time period. To address this issue, this paper proposes a sense representation and tracking framework based on deep contextualized embeddings, aiming at answering not only what and when, but also how the word meaning changes. The experiments show that our framework is effective in representing fine-grained word senses, and it brings a significant improvement in word change detection task. Furthermore, we model the word change from an ecological viewpoint, and sketch two interesting sense behaviors in the process of language evolution, i.e. sense competition and sense cooperation.

2018

pdf
From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information
Hengru Xu | Shen Li | Renfen Hu | Si Li | Sheng Gao
Proceedings of the 22nd Conference on Computational Natural Language Learning

Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification.

pdf
Analogical Reasoning on Chinese Morphological and Semantic Relations
Shen Li | Zhe Zhao | Renfen Hu | Wensi Li | Tao Liu | Xiaoyong Du
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Analogical reasoning is effective in capturing linguistic regularities. This paper proposes an analogical reasoning task on Chinese. After delving into Chinese lexical knowledge, we sketch 68 implicit morphological relations and 28 explicit semantic relations. A big and balanced dataset CA8 is then built for this task, including 17813 questions. Furthermore, we systematically explore the influences of vector representations, context features, and corpora on analogical reasoning. With the experiments, CA8 is proved to be a reliable benchmark for evaluating Chinese word embeddings.

2017

pdf
Initializing Convolutional Filters with Semantic Features for Text Classification
Shen Li | Zhe Zhao | Tao Liu | Renfen Hu | Xiaoyong Du
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Convolutional Neural Networks (CNNs) are widely used in NLP tasks. This paper presents a novel weight initialization method to improve the CNNs for text classification. Instead of randomly initializing the convolutional filters, we encode semantic features into them, which helps the model focus on learning useful features at the beginning of the training. Experiments demonstrate the effectiveness of the initialization technique on seven text classification tasks, including sentiment analysis and topic classification.

2016

pdf
The Construction of a Chinese Collocational Knowledge Resource and Its Application for Second Language Acquisition
Renfen Hu | Jiayong Chen | Kuang-hua Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The appropriate use of collocations is a challenge for second language acquisition. However, high quality and easily accessible Chinese collocation resources are not available for both teachers and students. This paper presents the design and construction of a large scale resource of Chinese collocational knowledge, and a web-based application (OCCA, Online Chinese Collocation Assistant) which offers free and convenient collocation search service to end users. We define and classify collocations based on practical language acquisition needs and utilize a syntax based method to extract nine types of collocations. Totally 37 extraction rules are compiled with word, POS and dependency relation features, 1,750,000 collocations are extracted from a corpus for L2 learning and complementary Wikipedia data, and OCCA is implemented based on these extracted collocations. By comparing OCCA with two traditional collocation dictionaries, we find OCCA has higher entry coverage and collocation quantity, and our method achieves quite low error rate at less than 5%. We also discuss how to apply collocational knowledge to grammatical error detection and demonstrate comparable performance to the best results in 2015 NLP-TEA CGED shared task. The preliminary experiment shows that the collocation knowledge is helpful in detecting all the four types of grammatical errors.

2015

pdf
A hybrid system for Chinese-English patent machine translation
Hongzheng Li | Kai Zhao | Renfen Hu | Yun Zhu | Yaohong Jin
Proceedings of the 6th Workshop on Patent and Scientific Literature Translation

2014

pdf
Pre-reordering Model of Chinese Special Sentences for Patent Machine Translation
Renfen Hu | Zhiying Liu | Lijiao Yang | Yaohong Jin
Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language