Mingyu Wan

2022

pdf bib abs
When Cantonese NLP Meets Pre-training: Progress and Challenges
Rong Xiang | Hanzhuo Tan | Jing Li | Mingyu Wan | Kam-Fai Wong
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Tutorial Abstracts

Cantonese is an influential Chinese variant with a large population of speakers worldwide. However, it is under-resourced in terms of the data scale and diversity, excluding Cantonese Natural Language Processing (NLP) from the stateof-the-art (SOTA) “pre-training and fine-tuning” paradigm. This tutorial will start with a substantially review of the linguistics and NLP progress for shaping language specificity, resources, and methodologies. It will be followed by an introduction to the trendy transformerbased pre-training methods, which have been largely advancing the SOTA performance of a wide range of downstream NLP tasks in numerous majority languages (e.g., English and Chinese). Based on the above, we will present the main challenges for Cantonese NLP in relation to Cantonese language idiosyncrasies of colloquialism and multilingualism, followed by the future directions to line NLP for Cantonese and other low-resource languages up to the cutting-edge pre-training practice.

pdf bib
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference
Mingyu Wan | Chu-Ren Huang
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

2020

pdf bib abs
Sina Mandarin Alphabetical Words:A Web-driven Code-mixing Lexical Resource
Rong Xiang | Mingyu Wan | Qi Su | Chu-Ren Huang | Qin Lu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Mandarin Alphabetical Word (MAW) is one indispensable component of Modern Chinese that demonstrates unique code-mixing idiosyncrasies influenced by language exchanges. Yet, this interesting phenomenon has not been properly addressed and is mostly excluded from the Chinese language system. This paper addresses the core problem of MAW identification and proposes to construct a large collection of MAWs from Sina Weibo (SMAW) using an automatic web-based technique which includes rule-based identification, informatics-based extraction, as well as Baidu search engine validation. A collection of 16,207 qualified SMAWs are obtained using this technique along with an annotated corpus of more than 200,000 sentences for linguistic research and applicable inquiries.

pdf bib abs
Modality Enriched Neural Network for Metaphor Detection
Mingyu Wan | Baixi Xing
Proceedings of the 28th International Conference on Computational Linguistics

Metaphor as a cognitive mechanism in human’s conceptual system manifests itself an effective way for language communication. Although being intuitively sensible for human, metaphor detection is still a challenging task due to the subtle ontological differences between metaphorical and non-metaphorical expressions. This work proposes a modality enriched deep learning model for tackling this unsolved issue. It provides a new perspective for understanding metaphor as a modality shift, as in ‘sweet voice’. It also attempts to enhance metaphor detection by combining deep learning with effective linguistic insight. Extending the work at Wan et al. (2020), we concatenate word sensorimotor scores (Lynott et al., 2019) with word vectors as the input of attention-based Bi-LSTM using a benchmark dataset–the VUA corpus. The experimental results show great F1 improvement (above 0.5%) of the proposed model over other methods in record, demonstrating the usefulness of leveraging modality norms for metaphor detection.

This paper reports a linguistically-enriched method of detecting token-level metaphors for the second shared task on Metaphor Detection. We participate in all four phases of competition with both datasets, i.e. Verbs and AllPOS on the VUA and the TOFEL datasets. We use the modality exclusivity and embodiment norms for constructing a conceptual representation of the nodes and the context. Our system obtains an F-score of 0.652 for the VUA Verbs track, which is 5% higher than the strong baselines. The experimental results across models and datasets indicate the salient contribution of using modality exclusivity and modality shift information for predicting metaphoricity.

Deep neural network models have played a critical role in sentiment analysis with promising results in the recent decade. One of the essential challenges, however, is how external sentiment knowledge can be effectively utilized. In this work, we propose a novel affection-driven approach to incorporating affective knowledge into neural network models. The affective knowledge is obtained in the form of a lexicon under the Affect Control Theory (ACT), which is represented by vectors of three-dimensional attributes in Evaluation, Potency, and Activity (EPA). The EPA vectors are mapped to an affective influence value and then integrated into Long Short-term Memory (LSTM) models to highlight affective terms. Experimental results show a consistent improvement of our approach over conventional LSTM models by 1.0% to 1.5% in accuracy on three large benchmark datasets. Evaluations across a variety of algorithms have also proven the effectiveness of leveraging affective terms for deep model enhancement.

pdf bib
Sensorimotor Enhanced Neural Network for Metaphor Detection
Mingyu Wan | Baixi Xing | Qi Su | Pengyuan Liu | Chu-Ren Huang
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

2019

Mingyu Wan

Fixing paper assignments

2022

2020

2019

2018

Co-authors

Venues