Shu-Kai Hsieh

Also published as: Shu-kai Hsieh, ShuKai Hsieh


2022

pdf
Analyzing discourse functions with acoustic features and phone embeddings: non-lexical items in Taiwan Mandarin
Pin-Er Chen | Yu-Hsiang Tseng | Chi-Wei Wang | Fang-Chi Yeh | Shu-Kai Hsieh
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

Non-lexical items are expressive devices used in conversations that are not words but are nevertheless meaningful. These items play crucial roles, such as signaling turn-taking or marking stances in interactions. However, as the non-lexical items do not stably correspond to written or phonological forms, past studies tend to focus on studying their acoustic properties, such as pitches and durations. In this paper, we investigate the discourse functions of non-lexical items through their acoustic properties and the phone embeddings extracted from a deep learning model. Firstly, we create a non-lexical item dataset based on the interpellation video clips from Taiwan’s Legislative Yuan. Then, we manually identify the non-lexical items and their discourse functions in the videos. Next, we analyze the acoustic properties of those items through statistical modeling and building classifiers based on phone embeddings extracted from a phone recognition model. We show that (1) the discourse functions have significant effects on the acoustic features; and (2) the classifiers built on phone embeddings perform better than the ones on conventional acoustic properties. These results suggest that phone embeddings may reflect the phonetic variations crucial in differentiating the discourse functions of non-lexical items.

pdf
CxLM: A Construction and Context-aware Language Model
Yu-Hsiang Tseng | Cing-Fang Shih | Pin-Er Chen | Hsin-Yu Chou | Mao-Chang Ku | Shu-Kai Hsieh
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Constructions are direct form-meaning pairs with possible schematic slots. These slots are simultaneously constrained by the embedded construction itself and the sentential context. We propose that the constraint could be described by a conditional probability distribution. However, as this conditional probability is inevitably complex, we utilize language models to capture this distribution. Therefore, we build CxLM, a deep learning-based masked language model explicitly tuned to constructions’ schematic slots. We first compile a construction dataset consisting of over ten thousand constructions in Taiwan Mandarin. Next, an experiment is conducted on the dataset to examine to what extent a pretrained masked language model is aware of the constructions. We then fine-tune the model specifically to perform a cloze task on the opening slots. We find that the fine-tuned model predicts masked slots more accurately than baselines and generates both structurally and semantically plausible word samples. Finally, we release CxLM and its dataset as publicly available resources and hope to serve as new quantitative tools in studying construction grammar.

pdf
Character Jacobian: Modeling Chinese Character Meanings with Deep Learning Model
Yu-Hsiang Tseng | Shu-Kai Hsieh
Proceedings of the 29th International Conference on Computational Linguistics

Compounding, a prevalent word-formation process, presents an interesting challenge for computational models. Indeed, the relations between compounds and their constituents are often complicated. It is particularly so in Chinese morphology, where each character is almost simultaneously bound and free when treated as a morpheme. To model such word-formation process, we propose the Notch (NOnlinear Transformation of CHaracter embeddings) model and the character Jacobians. The Notch model first learns the non-linear relations between the constituents and words, and the character Jacobians further describes the character’s role in each word. In a series of experiments, we show that the Notch model predicts the embeddings of the real words from their constituents but helps account for the behavioral data of the pseudowords. Moreover, we also demonstrated that character Jacobians reflect the characters’ meanings. Taken together, the Notch model and character Jacobians may provide a new perspective on studying the word-formation process and morphology with modern deep learning.

2021

pdf
Exploring sentiment constructions: connecting deep learning models with linguistic construction
Shu-Kai Hsieh | Yu-Hsiang Tseng
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf
Examine persuasion strategies in Chinese on social media
Yu-Yun Chang | Po-Ya Angela Wang | Han-Tang Hung | Ka-Sîng Khóo | Shu-Kai Hsieh
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf
What confuses BERT? Linguistic Evaluation of Sentiment Analysis on Telecom Customer Opinion
Cing-Fang Shih | Yu-Hsiang Tseng | Ching-Wen Yang | Pin-Er Chen | Hsin-Yu Chou | Lian-Hui Tan | Tzu-Ju Lin | Chun-Wei Wang | Shu-Kai Hsieh
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Ever-expanding evaluative texts on online forums have become an important source of sentiment analysis. This paper proposes an aspect-based annotated dataset consisting of telecom reviews on social media. We introduce a category, implicit evaluative texts, impevals for short, to investigate how the deep learning model works on these implicit reviews. We first compare two models, BertSimple and BertImpvl, and find that while both models are competent to learn simple evaluative texts, they are confused when classifying impevals. To investigate the factors underlying the correctness of the model’s predictions, we conduct a series of analyses, including qualitative error analysis and quantitative analysis of linguistic features with logistic regressions. The results show that local features that affect the overall sentential sentiment confuse the model: multiple target entities, transitional words, sarcasm, and rhetorical questions. Crucially, these linguistic features are independent of the model’s confidence measured by the classifier’s softmax probabilities. Interestingly, the sentence complexity indicated by syntax-tree depth is not correlated with the model’s correctness. In sum, this paper sheds light on the characteristics of the modern deep learning model and when it might need more supervision through linguistic evaluations.

pdf
Keyword-centered Collocating Topic Analysis
Yu-Lin Chang | Yongfu Liao | Po-Ya Angela Wang | Mao-Chang Ku | Shu-Kai Hsieh
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

The rapid flow of information and the abundance of text data on the Internet have brought about the urgent demand for the construction of monitoring resources and techniques used for various purposes. To extract facets of information useful for particular domains from such large and dynamically growing corpora requires an unsupervised yet transparent ways of analyzing the textual data. This paper proposed a hybrid collocation analysis as a potential method to retrieve and summarize Taiwan-related topics posted on Weibo and PTT. By grouping collocates of 臺灣 ‘Taiwan’ into clusters of topics via either word embeddings clustering or Latent Dirichlet allocation, lists of collocates can be converted to probability distributions such that distances and similarities can be defined and computed. With this method, we conduct a diachronic analysis of the similarity between Weibo and PTT, providing a way to pinpoint when and how the topic similarity between the two rises or falls. A fine-grained view on the grammatical behavior and political implications is attempted, too. This study thus sheds light on alternative explainable routes for future social media listening method on the understanding of cross-strait relationship.

2020

pdf
Mitigating Impacts of Word Segmentation Errors on Collocation Extraction in Chinese
Yongfu Liao | Shu-Kai Hsieh
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf
Lectal Variation of the Two Chinese Causative Auxiliaries
Cing-Fang Shih | Mao-Chang Ku | Shu-Kai Hsieh
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf
An Analysis of Multimodal Document Intent in Instagram Posts
Ying-Yu Chen | Shu-Kai Hsieh
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf
From Sense to Action: A Word-Action Disambiguation Task in NLP
Shu-Kai Hsieh | Yu-Hsiang Tseng | Chiung-Yu Chiang | Richard Lian | Yong-fu Liao | Mao-Chang Ku | Ching-Fang Shih
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf
Exploring Discourse on Same-sex Marriage in Taiwan: A Case Study of Near-Synonym of HOMOSEXUAL in Opposing Stances
Han-Tang Hung | Shu-Kai Hsieh
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf
Do You Believe It Happened? Assessing Chinese Readers’ Veridicality Judgments
Yu-Yun Chang | Shu-Kai Hsieh
Proceedings of the Twelfth Language Resources and Evaluation Conference

This work collects and studies Chinese readers’ veridicality judgments to news events (whether an event is viewed as happening or not). For instance, in “The FBI alleged in court documents that Zazi had admitted having a handwritten recipe for explosives on his computer”, do people believe that Zazi had a handwritten recipe for explosives? The goal is to observe the pragmatic behaviors of linguistic features under context which affects readers in making veridicality judgments. Exploring from the datasets, it is found that features such as event-selecting predicates (ESP), modality markers, adverbs, temporal information, and statistics have an impact on readers’ veridicality judgments. We further investigated that modality markers with high certainty do not necessarily trigger readers to have high confidence in believing an event happened. Additionally, the source of information introduced by an ESP presents low effects to veridicality judgments, even when an event is attributed to an authority (e.g. “The FBI”). A corpus annotated with Chinese readers’ veridicality judgments is released as the Chinese PragBank for further analysis.

pdf
Computational Modeling of Affixoid Behavior in Chinese Morphology
Yu-Hsiang Tseng | Shu-Kai Hsieh | Pei-Yi Chen | Sara Court
Proceedings of the 28th International Conference on Computational Linguistics

The morphological status of affixes in Chinese has long been a matter of debate. How one might apply the conventional criteria of free/bound and content/function features to distinguish word-forming affixes from bound roots in Chinese is still far from clear. Issues involving polysemy and diachronic dynamics further blur the boundaries. In this paper, we propose three quantitative features in a computational model of affixoid behavior in Mandarin Chinese. The results show that, except for in a very few cases, there are no clear criteria that can be used to identify an affix’s status in an isolating language like Chinese. A diachronic check using contextualized embeddings with the WordNet Sense Inventory also demonstrates the possible role of the polysemy of lexical roots across diachronic settings.

2019

pdf
Eigencharacter: An Embedding of Chinese Character Orthography
Yu-Hsiang Tseng | Shu-Kai Hsieh
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

Chinese characters are unique in its logographic nature, which inherently encodes world knowledge through thousands of years evolution. This paper proposes an embedding approach, namely eigencharacter (EC) space, which helps NLP application easily access the knowledge encoded in Chinese orthography. These EC representations are automatically extracted, encode both structural and radical information, and easily integrate with other computational models. We built EC representations of 5,000 Chinese characters, investigated orthography knowledge encoded in ECs, and demonstrated how these ECs identified visually similar characters with both structural and radical information.

pdf
Augmenting Chinese WordNet semantic relations with contextualized embeddings
Yu-Hsiang Tseng | Shu-Kai Hsieh
Proceedings of the 10th Global Wordnet Conference

Constructing semantic relations in WordNet has been a labour-intensive task, especially in a dynamic and fast-changing language environment. Combined with recent advancements of contextualized embeddings, this paper proposes the concept of morphology-guided sense vectors, which can be used to semi-automatically augment semantic relations in Chinese Wordnet (CWN). This paper (1) built sense vectors with pre-trained contextualized embedding models; (2) demonstrated the sense vectors computed were consistent with the sense distinctions made in CWN; and (3) predicted the potential semantically-related sense pairs with high accuracy by sense vectors model.

pdf
Extracting Semantic Representations of Sexual Biases from Word Vectors
Ying-Yu Chen | Shu-Kai Hsieh
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2018

pdf
Sinitic Wordnet: Laying the Groundwork with Chinese Varieties Written in Traditional Characters
Chih-Yao Lee | Shu-Kai Hsieh
Proceedings of the 9th Global Wordnet Conference

The present work seeks to make the logographic nature of Chinese script a relevant research ground in wordnet studies. While wordnets are not so much about words as about the concepts represented in words, synset formation inevitably involves the use of orthographic and/or phonetic representations to serve as headword for a given concept. For wordnets of Chinese languages, if their synsets are mapped with each other, the connection from logographic forms to lexicalized concepts can be explored backwards to, for instance, help trace the development of cognates in different varieties of Chinese. The Sinitic Wordnet project is an attempt to construct such an integrated wordnet that aggregates three Chinese varieties that are widely spoken in Taiwan and all written in traditional Chinese characters.

pdf
Fluid Annotation: A Granularity-aware Annotation Tool for Chinese Word Fluidity
Shu-Kai Hsieh | Yu-Hsiang Tseng | Chih-Yao Lee | Chiung-Yu Chiang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Exploring Lavender Tongue from Social Media Texts[In Chinese]
Hsiao-Han Wu | Shu-Kai Hsieh
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

pdf
ClassifierGuesser: A Context-based Classifier Prediction System for Chinese Language Learners
Nicole Peinelt | Maria Liakata | Shu-Kai Hsieh
Proceedings of the IJCNLP 2017, System Demonstrations

Classifiers are function words that are used to express quantities in Chinese and are especially difficult for language learners. In contrast to previous studies, we argue that the choice of classifiers is highly contextual and train context-aware machine learning models based on a novel publicly available dataset, outperforming previous baselines. We further present use cases for our database and models in an interactive demo system.

2016

pdf
CogALex-V Shared Task: LOPE
Kanan Luce | Jiaxing Yu | Shu-Kai Hsieh
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

Automatic discovery of semantically-related words is one of the most important NLP tasks, and has great impact on the theoretical psycholinguistic modeling of the mental lexicon. In this shared task, we employ the word embeddings model to testify two thoughts explicitly or implicitly assumed by the NLP community: (1). Word embedding models can reflect syntagmatic similarities in usage between words to distances in projected vector space. (2). Word embedding models can reflect paradigmatic relationships between words.

pdf
Evaluative Pattern Extraction for Automated Text Generation
Chia-Chen Lee | Shu-Kai Hsieh
Proceedings of the 9th International Natural Language Generation conference

pdf
Crowdsourcing Experiment Designs for Chinese Word Sense Annotation
Tzu-Yun Huang | Hsiao-Han Wu | Chia-Chen Lee | Shao-Man Lee | Guan-Wei Li | Shu-Kai Hsieh
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

pdf
Sarcasm Detection in Chinese Using a Crowdsourced Corpus
Shih-Kai Lin | Shu-Kai Hsieh
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

2015

pdf
An Arguing Lexicon for Stance Classification on Short Text Comments in Chinese
Ju-han Chuang | Shu-Kai Hsieh
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf
Linguistic Linked Data in Chinese: The Case of Chinese Wordnet
Chih-Yao Lee | Shu-Kai Hsieh
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

2014

pdf
Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics
Shu-Kai Hsieh
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper aims to examine and evaluate the current development of using Web-as-Corpus (WaC) paradigm in Chinese corpus linguistics. I will argue that the unstable notion of wordhood in Chinese and the resulting diverse ideas of implementing word segmentation systems have posed great challenges for those who are keen on building web-scaled corpus data. Two lexical measures are proposed to illustrate the issues and methodological discussions are provided.

pdf
Public Opinion Toward CSSTA: A Text Mining Approach
Yi-An Wu | Shu-Kai Hsieh
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014)

pdf
Sketching the Dependency Relations of Words in Chinese
Meng-Hsien Shih | Shu-Kai Hsieh
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014)

pdf bib
Public Opinion Toward CSSTA: A Text Mining Approach
Yi-An Wu | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 4, December 2014 - Special Issue on Selected Papers from ROCLING XXVI

pdf
Skillex: a graph-based lexical score for measuring the semantic efficiency of used verbs by human subjects describing actions
Bruno Gaume | Karine Duvignau | Emmanuel Navarro | Yann Desalle | Hintat Cheung | Shu-Kai Hsieh | Pierre Magistry | Laurent Prévot
Traitement Automatique des Langues, Volume 55, Numéro 3 : Traitement automatique du langage naturel et sciences cognitives [Natural Language Processing and Cognitive Sciences]

pdf
Leveraging Morpho-semantics for the Discovery of Relations in Chinese Wordnet
Shu-Kai Hsieh | Yu-Yun Chang
Proceedings of the Seventh Global Wordnet Conference

2013

pdf
Causing Emotion in Collocation:An Exploratory Data Analysis
Pei-Yu Lu | Yu-Yun Chang | Shu-Kai Hsieh
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf
Observing Features of PTT Neologisms: A Corpus-driven Study with N-gram Model
Tsun-Jui Liu | Shu-Kai Hsieh | Laurent Prevot
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 2, June 2013-Special Issue on Chinese Lexical Resources: Theories and Applications
Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 2, June 2013-Special Issue on Chinese Lexical Resources: Theories and Applications

pdf
Back to the Basic: Exploring Base Concepts from the Wordnet Glosses
Chan-Chia Hsu | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 2, June 2013-Special Issue on Chinese Lexical Resources: Theories and Applications

pdf
To Coerce or Not to Coerce: A Corpus-based Exploration of Some Complement Coercion Verbs in Chinese
Chan-Chia Hsu | Shu-Kai Hsieh
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

pdf
Features of Verb Complements in Co-composition: A case study of Chinese baking verb using Weibo corpus
Yu-Yun Chang | Shu-Kai Hsieh
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

2012

pdf
Measuring Individual Differences in Word Recognition: The Role of Individual Lexical Behaviors
Hsin-Ni Lin | Shu-Kai Hsieh | Shiao-Hui Chan
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf
Frequency, Collocation, and Statistical Modeling of Lexical Items: A Case Study of Temporal Expressions in Two Conversational Corpora
Sheng-Fu Wang | Jing-Chen Yang | Yu-Yun Chang | Yu-Wen Liu | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 2, June 2012—Special Issue on Selected Papers from ROCLING XXIII

bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV
Liang-Chih Yu | Richard Tzong-Han Tsai | Chia-Ping Chen | Cheng-Zen Yang | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV

pdf
Chinese Sentiments on the Clouds: A Preliminary Experiment on Corpus Processing and Exploration on Cloud Service
Shu-Kai Hsieh | Yu-Yun Chang | Meng-Xian Shih
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

2011

pdf
Frequency, Collocation, and Statistical Modeling of Lexical Items: A Case Study of Temporal Expressions in an Elderly Speaker Corpus
Sheng-Fu Wang | Jing-Chen Yang | Yu-Yun Chang | Yu-Wen Liu | Shu-Kai Hsieh
Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing (ROCLING 2011)

2010

pdf
SemEval-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain
Eneko Agirre | Oier Lopez de Lacalle | Christiane Fellbaum | Shu-Kai Hsieh | Maurizio Tesconi | Monica Monachini | Piek Vossen | Roxanne Segers
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
Kyoto: An Integrated System for Specific Domain WSD
Aitor Soroa | Eneko Agirre | Oier Lopez de Lacalle | Wauter Bosma | Piek Vossen | Monica Monachini | Jessie Lo | Shu-Kai Hsieh
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
Word Space Modeling for Measuring Semantic Specificity in Chinese
Ching-Fen Pan | Shu-Kai Hsieh
Coling 2010: Posters

pdf bib
PyCWN: a Python Module for Chinese Wordnet
Yueh-Cheng Wu | Shu-Kai Hsieh
Coling 2010: Demonstrations

pdf
Towards an Automatic Measurement of Verbal Lexicon Acquisition: The Case for a Young Children-versus-Adults Classification in French and Mandarin
Yann Desalle | Shu-Kai Hsieh | Bruno Gaume | Hintat Cheung
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf
Graph Representation of Synonymy and Translation Resources for Crosslinguistic Modelisation of Meaning
Benoît Gaillard | Yannick Chudy | Pierre Magistry | Shu-Kai Hsieh | Emmanuel Navarro
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf
Classifying mood in plurks
Mei-Yu Chen | Hsin-Ni Lin | Chang-An Shih | Yen-Ching Hsu | Pei-Yu Hsu | Shu-Kai Hsieh
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)

pdf
Qualia Modification in Noun-Noun Compounds: A Cross-Language Survey
Chih-yao Lee | Chia-hao Chang | Wei-chieh Hsu | Shu-kai Hsieh
ROCLING 2010 Poster Papers

2009

pdf
Bridging the Gap between Graph Modeling and Developmental Psycholinguistics: An Experiment on Measuring Lexical Proximity in Chinese Semantic Space
Shu-Kai Hsieh | Chun-Han Chang | Ivy Kuo | Hintat Cheung | Chu-Ren Huang | Bruno Gaume
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf
Wiktionary for Natural Language Processing: Methodology and Limitations
Emmanuel Navarro | Franck Sajous | Bruno Gaume | Laurent Prévot | ShuKai Hsieh | Ivy Kuo | Pierre Magistry | Chu-Ren Huang
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

pdf
CWN-LMF: Chinese WordNet in the Lexical Markup Framework
Lung-Hao Lee | Shu-Kai Hsieh | Chu-Ren Huang
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf
Query Expansion using LMF-Compliant Lexical Resources
Takenobu Tokunaga | Dain Kaplan | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Virach Sornlertlamvanich | Thatsanee Charoenporn | Yingju Xia | Chu-Ren Huang | Shu-Kai Hsieh | Kiyoaki Shirai
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf
Assessing Text Readability Using Hierarchical Lexical Relations Retrieved from WordNet
Shu-Yen Lin | Cheng-Chao Su | Yu-Da Lai | Li-Chin Yang | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 14, Number 1, March 2009

2008

pdf
KYOTO: a System for Mining, Structuring and Distributing Knowledge across Languages and Cultures
Piek Vossen | Eneko Agirre | Nicoletta Calzolari | Christiane Fellbaum | Shu-kai Hsieh | Chu-Ren Huang | Hitoshi Isahara | Kyoko Kanzaki | Andrea Marchetti | Monica Monachini | Federico Neri | Remo Raffaelli | German Rigau | Maurizio Tescon | Joop VanGent
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We outline work performed within the framework of a current EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology/biodiversity) anchored in a language-independent ontology that is linked to wordnets in seven languages. For each language, information extraction and identification of lexicalized concepts with ontological entries is carried out by text miners (“Kybots”). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project enables long-range knowledge sharing and transfer across many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the specific domains addressed here.

pdf
Extracting Concrete Senses of Lexicon through Measurement of Conceptual Similarity in Ontologies
Siaw-Fong Chung | Laurent Prévot | Mingwei Xu | Kathleen Ahrens | Shu-Kai Hsieh | Chu-Ren Huang
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The measurement of conceptual similarity in a hierarchical structure has been proposed by studies such as Wu and Palmer (1994) which have been summarized and evaluated in Budanisky and Hirst (2006). The present study applies the measurement of conceptual similarity to conceptual metaphor research by comparing concreteness of ontological resource nodes to several prototypical concrete nodes selected by human subjects. Here, the purpose of comparing conceptual similarity between nodes is to select a concrete sense for a word which is used metaphorically. Through using WordNet-SUMO interface such as SinicaBow (Huang, Chang and Lee, 2004), concrete senses of a lexicon will be selected once its SUMO nodes have been compared in terms of conceptual similarity with the prototypical concrete nodes. This study has strong implications for the interaction of psycholinguistic and computational linguistic fields in conceptual metaphor research.

pdf
Adapting International Standard for Asian Language Technologies
Takenobu Tokunaga | Dain Kaplan | Chu-Ren Huang | Shu-Kai Hsieh | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Kiyoaki Shirai | Virach Sornlertlamvanich | Thatsanee Charoenporn | YingJu Xia
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Corpus-based approaches and statistical approaches have been the main stream of natural language processing research for the past two decades. Language resources play a key role in such approaches, but there is an insufficient amount of language resources in many Asian languages. In this situation, standardisation of language resources would be of great help in developing resources in new languages. This paper presents the latest development efforts of our project which aims at creating a common standard for Asian language resources that is compatible with an international standard. In particular, the paper focuses on i) lexical specification and data categories relevant for building multilingual lexical resources for Asian languages; ii) a core upper-layer ontology needed for ensuring multilingual interoperability and iii) the evaluation platform used to test the entire architectural framework.

pdf
Constructing Taxonomy of Numerative Classifiers for Asian Languages
Kiyoaki Shirai | Takenobu Tokunaga | Chu-Ren Huang | Shu-Kai Hsieh | Tzu-Yi Kuo | Virach Sornlertlamvanich | Thatsanee Charoenporn
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Measuring Text Readability by Lexical Relations Retrieved from Wordnet
Shu-yen Lin | Cheng-chao Su | Yu-da Lai | Li-chin Yang | Shu-kai Hsieh
Proceedings of the 20th Conference on Computational Linguistics and Speech Processing

pdf
A Realistic and Robust Model for Chinese Word Segmentation
Chu-Ren Huang | Ting-Shuo Yo | Petr Šimon | Shu-Kai Hsieh
Proceedings of the 20th Conference on Computational Linguistics and Speech Processing

pdf
Automatic labeling of troponymy for Chinese verbs
Chiao-Shan Lo | Yi-Rung Chen | Chih-Yu Lin | Shu-Kai Hsieh
ROCLING 2008 Poster Papers

2007

pdf
Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification
Chu-Ren Huang | Petr Šimon | Shu-Kai Hsieh | Laurent Prévot
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf
Automatic Discovery of Named Entity Variants: Grammar-driven Approaches to Non-Alphabetical Transliterations
Chu-Ren Huang | Petr Šimon | Shu-Kai Hsieh
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
When Conset Meets Synset: A Preliminary Survey of an Ontological Lexical Resource Based on Chinese Characters
Shu-Kai Hsieh | Chu-Ren Huang
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
大規模詞彙語意關係自動標示之初步研究: 以中文詞網(Chinese Wordnet)為例 (A Preliminary Study on Large-scale Automatic Labeling of Lexical Semantic Relations: A Case study of Chinese Wordnet) [In Chinese]
Shu-Kai Hsieh | Petr Šimon | Chu-Ren Huang
Proceedings of the 18th Conference on Computational Linguistics and Speech Processing

2005

pdf
Word Meaning Inducing via Character Ontology: A Survey on the Semantic Prediction of Chinese Two-Character Words
Shu-Kai Hsieh
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

Search
Co-authors