Yu-Hsiang Tseng


2021

pdf bib
What confuses BERT? Linguistic Evaluation of Sentiment Analysis on Telecom Customer Opinion
Cing-Fang Shih | Yu-Hsiang Tseng | Ching-Wen Yang | Pin-Er Chen | Hsin-Yu Chou | Lian-Hui Tan | Tzu-Ju Lin | Chun-Wei Wang | Shu-Kai Hsieh
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Ever-expanding evaluative texts on online forums have become an important source of sentiment analysis. This paper proposes an aspect-based annotated dataset consisting of telecom reviews on social media. We introduce a category, implicit evaluative texts, impevals for short, to investigate how the deep learning model works on these implicit reviews. We first compare two models, BertSimple and BertImpvl, and find that while both models are competent to learn simple evaluative texts, they are confused when classifying impevals. To investigate the factors underlying the correctness of the model’s predictions, we conduct a series of analyses, including qualitative error analysis and quantitative analysis of linguistic features with logistic regressions. The results show that local features that affect the overall sentential sentiment confuse the model: multiple target entities, transitional words, sarcasm, and rhetorical questions. Crucially, these linguistic features are independent of the model’s confidence measured by the classifier’s softmax probabilities. Interestingly, the sentence complexity indicated by syntax-tree depth is not correlated with the model’s correctness. In sum, this paper sheds light on the characteristics of the modern deep learning model and when it might need more supervision through linguistic evaluations.

2020

pdf bib
Computational Modeling of Affixoid Behavior in Chinese Morphology
Yu-Hsiang Tseng | Shu-Kai Hsieh | Pei-Yi Chen | Sara Court
Proceedings of the 28th International Conference on Computational Linguistics

The morphological status of affixes in Chinese has long been a matter of debate. How one might apply the conventional criteria of free/bound and content/function features to distinguish word-forming affixes from bound roots in Chinese is still far from clear. Issues involving polysemy and diachronic dynamics further blur the boundaries. In this paper, we propose three quantitative features in a computational model of affixoid behavior in Mandarin Chinese. The results show that, except for in a very few cases, there are no clear criteria that can be used to identify an affix’s status in an isolating language like Chinese. A diachronic check using contextualized embeddings with the WordNet Sense Inventory also demonstrates the possible role of the polysemy of lexical roots across diachronic settings.

pdf bib
From Sense to Action: A Word-Action Disambiguation Task in NLP
Shu-Kai Hsieh | Yu-Hsiang Tseng | Chiung-Yu Chiang | Richard Lian | Yong-fu Liao | Mao-Chang Ku | Ching-Fang Shih
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

2019

pdf bib
Augmenting Chinese WordNet semantic relations with contextualized embeddings
Yu-Hsiang Tseng | Shu-Kai Hsieh
Proceedings of the 10th Global Wordnet Conference

Constructing semantic relations in WordNet has been a labour-intensive task, especially in a dynamic and fast-changing language environment. Combined with recent advancements of contextualized embeddings, this paper proposes the concept of morphology-guided sense vectors, which can be used to semi-automatically augment semantic relations in Chinese Wordnet (CWN). This paper (1) built sense vectors with pre-trained contextualized embedding models; (2) demonstrated the sense vectors computed were consistent with the sense distinctions made in CWN; and (3) predicted the potential semantically-related sense pairs with high accuracy by sense vectors model.

pdf bib
Eigencharacter: An Embedding of Chinese Character Orthography
Yu-Hsiang Tseng | Shu-Kai Hsieh
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

Chinese characters are unique in its logographic nature, which inherently encodes world knowledge through thousands of years evolution. This paper proposes an embedding approach, namely eigencharacter (EC) space, which helps NLP application easily access the knowledge encoded in Chinese orthography. These EC representations are automatically extracted, encode both structural and radical information, and easily integrate with other computational models. We built EC representations of 5,000 Chinese characters, investigated orthography knowledge encoded in ECs, and demonstrated how these ECs identified visually similar characters with both structural and radical information.

2018

pdf bib
Fluid Annotation: A Granularity-aware Annotation Tool for Chinese Word Fluidity
Shu-Kai Hsieh | Yu-Hsiang Tseng | Chih-Yao Lee | Chiung-Yu Chiang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)