Conference on Computational Linguistics and Speech Processing (2022)

Volumes

Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022) 47 papers

pdf (full)
bib (full) Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

pdf bib
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Yung-Chun Chang | Yi-Chin Huang

pdf bib abs
Language Model Based Chinese Handwriting Address Recognition
Chieh-Jen Wang | Yung-Ping Tien | Yun-Wei Hung

Chinese handwritten address recognition of consignment note is an important challenge of smart logistics automation. Chinese handwritten characters detection and recognition is the key technology for this application. Since the writing mode of handwritten characters is more complex and diverse than printed characters, it is easy misjudgment for recognition. Moreover, the address text occupies a small proportion in the image of the consignment note and arranged closely, which is easy to cause difficulties in detection. Therefore, how to detect the address text on the consignment note accurately is a focus of this paper. The consignment note address automatic detection and recognition system proposed in this paper detects and recognizes address characters, reduces the probability of misjudgment of Chinese handwriting recognition through language model, and improves the accuracy.

This paper constructs a Chinese dialogue-based information-seeking question answering dataset CMDQA, which is mainly applied to the scenario of getting Chinese movie related information. It contains 10K QA dialogs (40K turns in total). All questions and background documents are compiled from the Wikipedia via an Internet crawler. The answers to the questions are obtained via extracting the corresponding answer spans within the related text passage. In CMDQA, in addition to searching related documents, pronouns are also added to the question to better mimic the real dialog scenario. This dataset can test the individual performance of the information retrieval, the question answering and the question re-writing modules. This paper also provides a baseline system and shows its performance on this dataset. The experiments elucidate that it still has a big gap to catch the human performance. This dataset thus provides enough challenge for the researcher to conduct related research.

pdf abs
Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement
Yen-Hao Huang | Hsiao-Yen Lan | Yi-Shin Chen

Unsupervised extractive summarization has recently gained importance since it does not require labeled data. Among unsupervised methods, graph-based approaches have achieved outstanding results. These methods represent each document by a graph, with sentences as nodes and word-level similarity among sentences as edges. Common words can easily lead to a strong connection between sentence nodes. Thus, sentences with many common words can be misinterpreted as salient sentences for a summary. This work addresses the common word issue with a phrase-level graph that (1) focuses on the noun phrases of a document based on grammar dependencies and (2) initializes edge weights by term-frequency within the target document and inverse document frequency over the entire corpus. The importance scores of noun phrases extracted from the graph are then used to select the most salient sentences. To preserve summary coherence, the order of the selected sentences is re-arranged by a flow-aware orderBERT. The results reveal that our unsupervised framework outperformed other extractive methods on ROUGE as well as two human evaluations for semantic similarity and summary coherence.

It’s difficult to optimize individual label performance of multi-label text classification, especially in those imbalanced data containing long-tailed labels. Therefore, this study proposes a response-based knowledge distillation mechanism comprising a teacher model that optimizes binary classifiers of the corresponding labels and a student model that is a standalone multi-label classifier learning from distilled knowledge passed by the teacher model. A total of 2,724 Chinese healthcare texts were collected and manually annotated across nine defined labels, resulting in 8731 labels, each containing an average of 3.2 labels. We used 5-fold cross-validation to compare the performance of several multi-label models, including TextRNN, TextCNN, HAN, and GRU-att. Experimental results indicate that using the proposed knowledge distillation mechanism effectively improved the performance no matter which model was used, about 2-3% of micro-F1, 4-6% of macro-F1, 3-4% of weighted-F1 and 1-2% of subset accuracy for performance enhancement.

pdf abs
Combining Word Vector Technique and Clustering Algorithm for Credit Card Merchant Detection
Fang-Ju Lee | Ying-Chun Lo | Jheng-Long Wu

Extracting relevant user behaviors through customer’s transaction description is one of the ways to collect customer information. In the current text mining field, most of the researches are mainly study text classification, and only few study text clusters. Find the relationship between letters and words in the unstructured transaction consumption description. Use Word Embedding and text mining technology to break through the limitation of classification conditions that need to be distinguished in advance, establish automatic identification and analysis methods, and improve the accuracy of grouping. In this study, use Jieba to segment Chinese words, were based on the content of credit card transaction description. Feature extractions of Word2Vec, combined with Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Hierarchical Agglomerative Clustering, cross-combination experiments. The prediction results of MUC, B3 and CEAF’s F1 average of 67.58% are more significant.

pdf abs
Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System
Chia-Hsuan Lin | Jian-Peng Liao | Cho-Chun Hsieh | Kai-Chun Liao | Chun-Hsin Wu

This paper proposes a multi-speaker talking-face synthesis system. The system incorporates voice cloning and lip-syncing technology to achieve text-to-talking-face generation by acquiring audio and video clips of any speaker and using zero-shot transfer learning. In addition, we used open-source corpora to train several Taiwanese-accented models and proposed using Mandarin Phonetic Symbols (Bopomofo) as the character embedding of the synthesizer to improve the system’s ability to synthesize Chinese-English code-switched sentences. Through our system, users can create rich applications. Also, the research on this technology is novel in the audiovisual speech synthesis field.

pdf abs
Is Character Trigram Overlapping Ratio Still the Best Similarity Measure for Aligning Sentences in a Paraphrased Corpus?
Aleksandra Smolka | Hsin-Min Wang | Jason S. Chang | Keh-Yih Su

Sentence alignment is an essential step in studying the mapping among different language expressions, and the character trigram overlapping ratio was reported to be the most effective similarity measure in aligning sentences in the text simplification dataset. However, the appropriateness of each similarity measure depends on the characteristics of the corpus to be aligned. This paper studies if the character trigram is still a suitable similarity measure for the task of aligning sentences in a paragraph paraphrasing corpus. We compare several embedding-based and non-embeddings model-agnostic similarity measures, including those that have not been studied previously. The evaluation is conducted on parallel paragraphs sampled from the Webis-CPC-11 corpus, which is a paragraph paraphrasing dataset. Our results show that modern BERT-based measures such as Sentence-BERT or BERTScore can lead to significant improvement in this task.

pdf abs
RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model
Ming-Hsiang Su | Chin-Wei Lee | Chi-Lun Hsu | Ruei-Cyuan Su

In this study, a named entity recognition was constructed and applied to the identification of Chinese medicine names and disease names. The results can be further used in a human-machine dialogue system to provide people with correct Chinese medicine medication reminders. First, this study uses web crawlers to sort out web resources into a Chinese medicine named entity corpus, collecting 1097 articles, 1412 disease names and 38714 Chinese medicine names. Then, we annotated each article using TCM name and BIO tagging method. Finally, this study trains and evaluates BERT, ALBERT, RoBERTa, GPT2 with BiLSTM and CRF. The experimental results show that RoBERTa’s NER system combining BiLSTM and CRF achieves the best system performance, with a precision rate of 0.96, a recall rate of 0.96, and an F1-score of 0.96.

pdf abs
A Study on Using Different Audio Lengths in Transfer Learning for Improving Chainsaw Sound Recognition
Jia-Wei Chang | Zhong-Yun Hu

Chainsaw sound recognition is a challenging task because of the complexity of sound and the excessive noises in mountain environments. This study aims to discuss the influence of different sound lengths on the accuracy of model training. Therefore, this study used LeNet, a simple model with few parameters, and adopted the design of average pooling to enable the proposed models to receive audio of any length. In performance comparison, we mainly compared the influence of different audio lengths and further tested the transfer learning from short-to-long and long-to-short audio. In experiments, we used the ESC-10 dataset for training models and validated their performance via the self-collected chainsaw-audio dataset. The experimental results show that (a) the models trained with different audio lengths (1s, 3s, and 5s) have accuracy from 74% 78%, 74% 77%, and 79% 83% on the self-collected dataset. (b) The generalization of the previous models is significantly improved by transfer learning, the models achieved 85.28%, 88.67%, and 91.8% of accuracy. (c) In transfer learning, the model learned from short-to-long audios can achieve better results than that learned from long-to-short audios, especially being differed 14% of accuracy on 5s chainsaw-audios.

pdf abs
Using Grammatical and Semantic Correction Model to Improve Chinese-to-Taiwanese Machine Translation Fluency
Yuan-Han Li | Chung-Ping Young | Wen-Hsiang Lu

Currently, there are three major issues to tackle in Chinese-to-Taiwanese machine translation: multi-pronunciation Taiwanese words, unknown words, and Chinese-to-Taiwanese grammatical and semantic transformation. Recent studies have mostly focused on the issues of multi-pronunciation Taiwanese words and unknown words, while very few research papers focus on grammatical and semantic transformation. However, there exist grammatical rules exclusive to Taiwanese that, if not translated properly, would cause the result to feel unnatural to native speakers and potentially twist the original meaning of the sentence, even with the right words and pronunciations. Therefore, this study collects and organizes a few common Taiwanese sentence structures and grammar rules, then creates a grammar and semantic correction model for Chinese-to-Taiwanese machine translation, which would detect and correct grammatical and semantic discrepancies between the two languages, thus improving translation fluency.

In this paper, we use several combinations of feature front-end modules and attention mechanisms to improve the performance of our speaker verification system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace and integrate different feature front-end and attention mechanism modules to compare and find the most effective model design, and this model would be our final system. We use VoxCeleb 2 dataset as our training set, and test the performance of our models on several test sets. With our final proposed model, we improved performance by 16% over baseline on VoxSRC2022 valudation set, achieving better results for our speaker verification system.

pdf abs
A Preliminary Study of the Application of Discrete Wavelet Transform Features in Conv-TasNet Speech Enhancement Model
Yan-Tong Chen | Zong-Tai Wu | Jeih-Weih Hung

Nowadays, time-domain features have been widely used in speech enhancement (SE) networks like frequency-domain features to achieve excellent performance in eliminating noise from input utterances. This study primarily investigates how to extract information from time-domain utterances to create more effective features in speech enhancement. We present employing sub-signals dwelled in multiple acoustic frequency bands in time domain and integrating them into a unified feature set. We propose using the discrete wavelet transform (DWT) to decompose each input frame signal to obtain sub-band signals, and a projection fusion process is performed on these signals to create the ultimate features. The corresponding fusion strategy is the bi-projection fusion (BPF). In short, BPF exploits the sigmoid function to create ratio masks for two feature sources. The concatenation of fused DWT features and time features serves as the encoder output of a celebrated SE framework, fully-convolutional time-domain audio separation network (Conv-TasNet), to estimate the mask and then produce the enhanced time-domain utterances. The evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks. The experimental results reveal that the proposed method achieves higher speech quality and intelligibility than the original Conv-TasNet that uses time features only, indicating that the fusion of DWT features created from the input utterances can benefit time features to learn a superior Conv-TasNet in speech enhancement.

pdf abs
Exploiting the compressed spectral loss for the learning of the DEMUCS speech enhancement network
Chi-En Dai | Qi-Wei Hong | Jeih-Weih Hung

This study aims to improve a highly effective speech enhancement technique, DEMUCS, by revising the respective loss function in learning. DEMUCS, developed by Facebook Team, is built on the Wave-UNet and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. Although DEMUCS processes the input speech utterance purely in the time (wave) domain, the applied loss function consists of wave-domain L1 distance and multi-scale shorttime-Fourier-transform (STFT) loss. That is, both time- and frequency-domain features are taken into consideration in the learning of DEMUCS. In this study, we present revising the STFT loss in DEMUCS by employing the compressed magnitude spectrogram. The compression is done by either the power-law operation with a positive exponent less than one, or the logarithmic operation. We evaluate the presented novel framework on the VoiceBank-DEMAND database and task. The preliminary experimental results suggest that DEMUCS containing the power-law compressed magnitude spectral loss outperforms the original DEMUCS by providing the test utterances with higher objective quality and intelligibility scores (PESQ and STOI). Relatively, the logarithm compressed magnitude spectral loss does not benefit DEMUCS. Therefore, we reveal that DEMUCS can be further improved by properly revising the STFT terms of its loss function.

pdf abs
Using Machine Learning and Pattern-Based Methods for Identifying Elements in Chinese Judgment Documents of Civil Cases
Hong-Ren Lin | Wei-Zhi Liu | Chao-Lin Liu | Chieh Yang

Providing structural information about civil cases for judgement prediction systems or recommendation systems can enhance the efficiency of the inference procedures and the justifiability of produced results. In this research, we focus on the civil cases about alimony, which is a relatively uncommon choice in current applications of artificial intelligence in law. We attempt to identify the statements for four types of legal functions in judgement documents, i.e., the pleadings of the applicants, the responses of the opposite parties, the opinions of the courts, and uses of laws to reach the final decisions. In addition, we also try to identify the conflicting issues between the plaintiffs and the defendants in the judgement documents.

pdf abs
Development of Mandarin-English code-switching speech synthesis system
Hsin-Jou Lien | Li-Yu Huang | Chia-Ping Chen

In this paper, the Mandarin-English code-switching speech synthesis system has been proposed. To focus on learning the content information between two languages, the training dataset is multilingual artificial dataset whose speaker style is unified. Adding language embedding into the system helps it be more adaptive to multilingual dataset. Besides, text preprocessing is applied and be used in different way which depends on the languages. Word segmentation and text-to-pinyin are the text preprocessing for Mandarin, which not only improves the fluency but also reduces the learning complexity. Number normalization decides whether the arabic numerals in sentence needs to add the digits. The preprocessing for English is acronym conversion which decides the pronunciation of acronym.

pdf abs
Predicting Judgments and Grants for Civil Cases of Alimony for the Elderly
Wei-Zhi Liu | Po-Hsien Wu | Hong-Ren Lin | Chao-Lin Liu

The needs for mediation are increasing rapidly along with the increasing number of cases of the alimony for the elderly in recent years. Offering a prediction mechanism for predicting the outcomes of some prospective lawsuits may alleviate the workload of the mediation courts. This research aims to offer the predictions for the judgments and the granted alimony for the plaintiffs of such civil cases in Chinese, based on our analysis of results of the past lawsuits. We hope that the results can be helpful for both the involved parties and the courts. To build the current system, we segment and vectorize the texts of the judgement documents, and apply the logistic regression and model tree models for predicting the judgments and for estimating the granted alimony of the cases, respectively.

In this paper, we proposed RepVGGRNN, which is a light weight sound event detection model. We use RepVGG convolution blocks in the convolution part to improve performance, and re-parameterize the RepVGG blocks after the model is trained to reduce the parameters of the convolution layers. To further improve the accuracy of the model, we incorporated both the mean teacher method and knowledge distillation to train the lightweight model. The proposed system achieves PSDS (Polyphonic sound event detection score)-scenario 1, 2 of 40.8% and 67.7% outperforms the baseline system of 34.4% and 57.2% on the DCASE 2022 Task4 validation dataset. The quantity of the parameters in the proposed system is about 49.6K, only 44.6% of the baseline system.

pdf abs
Analyzing discourse functions with acoustic features and phone embeddings: non-lexical items in Taiwan Mandarin
Pin-Er Chen | Yu-Hsiang Tseng | Chi-Wei Wang | Fang-Chi Yeh | Shu-Kai Hsieh

Non-lexical items are expressive devices used in conversations that are not words but are nevertheless meaningful. These items play crucial roles, such as signaling turn-taking or marking stances in interactions. However, as the non-lexical items do not stably correspond to written or phonological forms, past studies tend to focus on studying their acoustic properties, such as pitches and durations. In this paper, we investigate the discourse functions of non-lexical items through their acoustic properties and the phone embeddings extracted from a deep learning model. Firstly, we create a non-lexical item dataset based on the interpellation video clips from Taiwan’s Legislative Yuan. Then, we manually identify the non-lexical items and their discourse functions in the videos. Next, we analyze the acoustic properties of those items through statistical modeling and building classifiers based on phone embeddings extracted from a phone recognition model. We show that (1) the discourse functions have significant effects on the acoustic features; and (2) the classifiers built on phone embeddings perform better than the ones on conventional acoustic properties. These results suggest that phone embeddings may reflect the phonetic variations crucial in differentiating the discourse functions of non-lexical items.

pdf abs
A Dimensional Valence-Arousal-Irony Dataset for Chinese Sentence and Context
Sheng-Wei Huang | Wei-Yi Chung | Yu-Hsuan Wu | Chen-Chia Yu | Jheng-Long Wu

Chinese multi-dimensional sentiment detection is a challenging task with a considerable impact on semantic understanding. Past irony datasets are utilized to annotate sentiment type of whole sentences of irony. It does not provide the corresponding intensity of valence and arousal on the sentences and context. However, an ironic statement is defined as a statement whose apparent meaning is the opposite of its actual meaning. This means that in order to understand the actual meaning of a sentence, contextual information is needed. Therefore, the dimensional sentiment intensities of ironic sentences and context are important issues in the natural language processing field. This paper creates the extended NTU irony corpus, which includes valence, arousal and irony intensities on sentence-level; and valence and arousal intensities on context-level, called Chinese Dimensional Valence-Arousal-Irony (CDVAI) dataset. Therefore, this paper analyzes the annotation difference between the human annotators and uses a deep learning model such as BERT to evaluate the prediction performances on CDVAI corpus.

In view of the lack of overall specialized design services for harbour recreation in Taiwan nowadays, various marine recreational activities and marine scenic spots haven’t yet been planned and developed in the integration of services around the city and harbour. As there are not many state-of-the-art products and application services, and Taiwan’s harbour leisure services-related industries are facing the challenge of digital transformation. Institute for Information Industry proposed an innovative “Smart Future Recreational Harbour Application Service” project, taking Kaohsiung Asia’s New Bay Area as the main field of demonstration, Using multi-source knowledge graph integration and inference technology to recommend appropriate recreational service information, as a result, tourists can enjoy the best virtual reality intelligent human-machine interactive service experience during their trip.

pdf abs
HanTrans: An Empirical Study on Cross-Era Transferability of Chinese Pre-trained Language Model
Chin-Tung Lin | Wei-Yun Ma

The pre-trained language model has recently dominated most downstream tasks in the NLP area. Particularly, bidirectional Encoder Representations from Transformers (BERT) is the most iconic pre-trained language model among the NLP tasks. Their proposed masked-language modeling (MLM) is an indispensable part of the existing pre-trained language models. Those outperformed models for downstream tasks benefited directly from the large training corpus in the pre-training stage. However, their training corpus for modern traditional Chinese was light. Most of all, the ancient Chinese corpus is still disappearance in the pre-training stage. Therefore, we aim to address this problem by transforming the annotation data of ancient Chinese into BERT style training corpus. Then we propose a pre-trained Oldhan Chinese BERT model for the NLP community. Our proposed model outperforms the original BERT model by significantly reducing perplexity scores in masked-language modeling (MLM). Also, our fine-tuning models improve F1 scores on word segmentation and part-of-speech tasks. Then we comprehensively study zero-shot cross-eras ability in the BERT model. Finally, we visualize and investigate personal pronouns in the embedding space of ancient Chinese records from four eras. We have released our code at https://github.com/ckiplab/han-transformers.

pdf abs
A Preliminary Study on Automated Speaking Assessment of English as a Second Language (ESL) Students
Tzu-I Wu | Tien-Hong Lo | Fu-An Chao | Yao-Ting Sung | Berlin Chen

Due to the surge in global demand for English as a second language (ESL), developments of automated methods for grading speaking proficiency have gained considerable attention. This paper aims to present a computerized regime of grading the spontaneous spoken language for ESL learners. Based on the speech corpus of ESL learners recently collected in Taiwan, we first extract multi-view features (e.g., pronunciation, fluency, and prosody features) from either automatic speech recognition (ASR) transcription or audio signals. These extracted features are, in turn, fed into a tree-based classifier to produce a new set of indicative features as the input of the automated assessment system, viz. the grader. Finally, we use different machine learning models to predict ESL learners’ respective speaking proficiency and map the result into the corresponding CEFR level. The experimental results and analysis conducted on the speech corpus of ESL learners in Taiwan show that our approach holds great potential for use in automated speaking assessment, meanwhile offering more reliable predictive results than the human experts.

pdf abs
Clustering Issues in Civil Judgments for Recommending Similar Cases
Yi-Fan Liu | Chao-Lin Liu | Chieh Yang

Similar judgments search is an important task in legal practice, from which valuable legal insights can be obtained. Issues are disputes between both parties in civil litigation, which represents the core topics to be considered in the trials. Many studies calculate the similarity between judgments from different perspectives and methods. We first cluster the issues in the judgments, and then encode the judgments with vectors for whether or not the judgments contain issues in the corresponding clusters. The similarity between the judgments are evaluated based on the encoded messages. We verify the effectiveness of the system with a human scoring process by a legal background assistant, while comparing the effects of several combinations of preprocessing steps and selections of clustering strategies.

pdf abs
Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora
Wen-Chao Yeh | Yu-Lun Hsieh | Yung-Chun Chang | Wen-Lian Hsu

This study aims to evaluate three most popular word segmentation tool for a large Traditional Chinese corpus in terms of their efficiency, resource consumption, and cost. Specifically, we compare the performances of Jieba, CKIP, and MONPA on word segmentation, part-of-speech tagging and named entity recognition through extensive experiments. Experimental results show that MONPA using GPU for batch segmentation can greatly reduce the processing time of massive datasets. In addition, its features such as word segmentation, part-of-speech tagging, and named entity recognition are beneficial to downstream applications.

pdf abs
Mandarin-English Code-Switching Speech Recognition System for Specific Domain
Chung-Pu Chiou | Hou-An Lin | Chia-Ping Chen

This paper will introduce the use of Automatic Speech Recognition (ASR) technology to process speech content with specific domain. We will use the Conformer end-to-end model as the system architecture, and use pure Chinese data for initial training. Next, use the transfer learning technology to fine-tune the system with Mandarin-English code-switching data. Finally, use the Mandarin-English code-switching data with a specific domain makes the final fine-tuning of the model so that it can achieve a certain effect on speech recognition in a specific domain. Experiments with different fine-tuning methods reduce the final error rate from 82.0% to 34.8%.

pdf abs
Legal Case Winning Party Prediction With Domain Specific Auxiliary Models
Sahan Jayasinghe | Lakith Rambukkanage | Ashan Silva | Nisansa de Silva | Amal Shehan Perera

Sifting through hundreds of old case documents to obtain information pertinent to the case in hand has been a major part of the legal profession for centuries. However, with the expansion of court systems and the compounding nature of case law, this task has become more and more intractable with time and resource constraints. Thus automation by Natural Language Processing presents itself as a viable solution. In this paper, we discuss a novel approach for predicting the winning party of a current court case by training an analytical model on a corpus of prior court cases which is then run on the prepared text on the current court case. This will allow legal professionals to efficiently and precisely prepare their cases to maximize the chance of victory. The model is built with and experimented using legal domain specific sub-models to provide more visibility to the final model, along with other variations. We show that our model with critical sentence annotation with a transformer encoder using RoBERTa based sentence embedding is able to obtain an accuracy of 75.75%, outperforming other models.

pdf abs
Early Speech Production in Infants and Toddlers Later Diagnosed with Cerebral Palsy: A Retrospective Study
Chien Ju Chan | Li-Mei Chen | Li-Wen Chen

In this retrospective study, we compared the early speech development between infants with cerebral palsy (CP) and typically developing (TD) infants. The recordings of utterances were collected from two CP infants and two typically-developing (TD) infants at approximately 8 and 24 months old. The data was analyzed by volubility, consonant emergence, canonical babbling ratio (CBR), mean babbling level (MBL). The major findings show that comparing with TD group, CP group has the characteristics of: 1) lower volubility 2) CBRutter below 0.15 at 2 years old 3) MBL score below 2 at the age of 2 with a feature of above 95% in level 1 4) using consonants mainly at two oral places (bilabials and velars) and three manners of articulation (nasal, fricative, and stop) at 2 years old.

pdf abs
Automatic Generation of Abstracts for Research Papers
Dushan Kumarasinghe | Nisansa de Silva

Summarizing has always been an important utility for reading long documents. Research papers are unique in this regard, as they have a compulsory summary in the form of the abstract in the beginning of the document which gives the gist of the entire study often within a set upper limit for the word count. Writing the abstract to be sufficiently succinct while being descriptive enough is a hard task even for native English speakers. This study is the first step in generating abstracts for research papers in the computational linguistics domain automatically using the domain-specific abstractive summarization power of the GPT-Neo model.

pdf abs
Speech Timing in Typically Developing Mandarin-Speaking Children From Ages 3 To 4
Jeng Man Lew | Li-Mei Chen | Yu Ching Lin

This study aims to develop a better understanding of the speech timing development in Mandarin-speaking children from 3 to 4 years of age. Data were selected from two typically developing children. Four 50-min recordings were collected during 3 and 4 years old based on natural conversation among the observers, participants, and the parents, and the picture-naming task. Speech timing were measured by Praat, including speaking rate, articulation rate, mean length of utterance (MLUs), mean utterance duration, mean word duration, pause ratio, and volubility. Major findings of the current study are: 1) Five measurements (speaking rate, mean length of utterance(MLUs), mean utterance length, mean word duration and volubility) decreased with age in both children; 2) Articulation rate of both children increased with age; 3) Comparing with the findings from previous studies, pause ratio of both slightly increased with age. These findings not only contribute to a more comprehensive data for assessment, it also can be a reference in speech intervention.

pdf abs
Right-Dominant Tones in Zhangzhou: On and Through Phonetic Surface
Yishan Huang

This study conducts a systematic acoustic exploration into the phonetic nature of rightmost tones in a right-dominant tone sandhi system based on empirical data from 21 native speakers of Zhangzhou Southern Min, which presents eight tonal contrasts at the underlying level. The results reveal that, (a) the F0 contour shape realisation of rightmost tones in Zhangzhou appears not to be categorically affected by their preceding tones. (b) Seven out of eight rightmost tones have two statistically significantly different variants in their F0 onset realisation, indicating their regressive sensitivity to the offset phonetics of preceding tones. (c) The forms of rightmost tones are not straightforward related to their counterparts in citation. Instead, two versions of the F0 system can be identified, with the unmarked forms resembling their citation values and the marked forms occurring as a consequence of the phonetic impact of their preceding tones and the F0-declining effect of utterance-final position. (d) The phonetic variation of rightmost tones reflects the across-linguistic tendency of tonal articulation in connected speech but contradicts the default principle for identifying the right dominance of tone sandhi in Sinitic languages.

pdf abs
Web-API-Based Chatbot Generation with Analysis and Expansion for Training Sentences
Sheng-Kai Wang | Wan-Lin You | Shang-Pin Ma

With Web API technology becoming increasingly mature, how to integrate Web API and Chatbot technology has become an issue of great interest. This study plans to build a semi-automatic method and tool, BOTEN. This method allows application developers to build Chatbot interfaces with specified Web APIs quickly. To ensure that the Chatbot has sufficient natural language understanding (NLU) capability, this research evaluates the training sentences written by the developer through TF-IDF, WordNet, and SpaCy techniques, and suggests the developer modify the training sentences with poor quality. This technique can also be used to automatically increase the number of training sentences to improve the capability of Intent recognition.

pdf abs
The Design and Development of a System for Chinese Character Difficulty and Features
Jung-En Haung | Hou-Chiang Tseng | Li-Yun Chang | Hsueh-Chih Chen | Yao-Ting Sung

Feature analysis of Chinese characters plays a prominent role in “character-based” education. However, there is an urgent need for a text analysis system for processing the difficulty of composing components for characters, primarily based on Chinese learners’ performance. To meet this need, the purpose of this research was to provide such a system by adapting a data-driven approach. Based on Chen et al.’s (2011) Chinese Orthography Database, this research has designed and developed an system: Character Difficulty - Research on Multi-features (CD-ROM). This system provides three functions: (1) analyzing a text and providing its difficulty regarding Chinese characters; (2) decomposing characters into components and calculating the frequency of components based on the analyzed text; and (3) affording component-deriving characters based on the analyzed text and downloadable images as teaching materials. With these functions highlighting multi-level features of characters, this system has the potential to benefit the fields of Chinese character instruction, Chinese orthographic learning, and Chinese natural language processing.

Image captioning is a prominent Artificial Intelligence (AI) research area that deals with visual recognition and a linguistic description of the image. It is an interdisciplinary field concerning how computers can see and understand digital images& videos, and describe them in a language known to humans. Constructing a meaningful sentence needs both structural and semantic information of the language. This paper highlights the contribution of image caption generation for the Assamese language. The unavailability of an image caption generation system for the Assamese language is an open problem for AI-NLP researchers, and it’s just an early stage of the research. To achieve our defined objective, we have used the encoder-decoder framework, which combines the Convolutional Neural Networks and the Recurrent Neural Networks. The experiment has been tested on Flickr30k and Coco Captions dataset, which have been originally present in the English language. We have translated these datasets into Assamese language using the state-of-the-art Machine Translation (MT) system for our designed work.

pdf abs
Building an Enhanced Autoregressive Document Retriever Leveraging Supervised Contrastive Learning
Yi-Cheng Wang | Tzu-Ting Yang | Hsin-Wei Wang | Yung-Chang Hsu | Berlin Chen

The goal of an information retrieval system is to retrieve documents that are most relevant to a given user query from a huge collection of documents, which usually requires time-consuming multiple comparisons between the query and candidate documents so as to find the most relevant ones. Recently, a novel retrieval modeling approach, dubbed Differentiable Search Index (DSI), has been proposed. DSI dramatically simplifies the whole retrieval process by encoding all information about the document collection into the parameter space of a single Transformer model, on top of which DSI can in turn generate the relevant document identities (IDs) in an autoregressive manner in response to a user query. Although DSI addresses the shortcomings of traditional retrieval systems, previous studies have pointed out that DSI might fail to retrieve relevant documents because DSI uses the document IDs as the pivotal mechanism to establish the relationship between queries and documents, whereas not every document in the document collection has its corresponding relevant and irrelevant queries for the training purpose. In view of this, we put forward to leveraging supervised contrastive learning to better render the relationship between queries and documents in the latent semantic space. Furthermore, an approximate nearest neighbor search strategy is employed at retrieval time to further assist the Transformer model in generating document IDs relevant to a posed query more efficiently. A series of experiments conducted on the Nature Question benchmark dataset confirm the effectiveness and practical feasibility of our approach in relation to some strong baseline systems.

pdf abs
A Quantitative Analysis of Comparison of Emoji Sentiment: Taiwan Mandarin Users and English Users
Fang-Yu Chang

Emojis have become essential components in our digital communication. Emojis, especially smiley face emojis and heart emojis, are considered the ones conveying more emotions. In this paper, two functions of emoji usages are discussed across two languages, Taiwanese Mandarin and English. The first function discussed here is sentiment enhancement and the other is sentiment modification. Multilingual language model is adopted for seeing the probability distribution of the text sentiment, and relative entropy is used to quantify the degree of changes. The results support the previous research that emojis are more frequently-used in positive contexts, smileys tend to be used for expressing emotions and prove the language-independent nature of emojis.

pdf abs
Applying Information Extraction to Storybook Question and Answer Generation
Kai-Yen Kao | Chia-Hui Chang

For educators, how to generate high quality question-answer pairs from story text is a time-consuming and labor-intensive task. The purpose is not to make students unable to answer, but to ensure that students understand the story text through the generated question-answer pairs. In this paper, we improve the FairyTaleQA question generation method by incorporating question type and its definition to the input for fine-tuning the BART (Lewis et al., 2020) model. Furthermore, we make use of the entity and relation extraction from (Zhong and Chen, 2021) as an element of template-based question generation.

pdf abs
Improving Response Diversity through Commonsense-Aware Empathetic Response Generation
Tzu-Hsien Huang | Chia-Hui Chang

Due to the lack of conversation practice, the main challenge for the second-language learners is speaking. Our goal is to develop a chatbot to encourage individuals to reflect, describe, analyse and communicate what they read as well as improve students’ English expression skills. In this paper, we exploit COMMET, an inferential commonsense knowledge generator, as the background knowledge to improve the generation diversity. We consider two approaches to increase the diversity of empathetic response generation. For nonpretrained models, We apply AdaLabel (Wang et al., 2021) to Commonsense-aware Empathetic model (Sabour et al., 2022) and improve Distinct-2 score from 2.99 to 4.08 on EMPATHETIC DIALOGUES (ED). Furthermore, we augment the pretrained BART model with various commonsense knowledge to generate more informative empathetic responses. Not only has the automatic evaluation of distinct-2 scores improved from 9.11 to 11.21, but the manual case study also shows that CE-BART significantly outperform CEM-AdaLabel.

pdf abs
A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data
Yi-Hsiang Hung | Yi-Chin Huang

In this study, we implemented a machine translation system using the Convolutional Neural Network with Attention mechanism for translating Mandarin to Sixan-accent Hakka. Specifically, to cope with the different idioms or terms used between Northern and Southern Sixan-accent, we analyzed the corpus differences and lexicon definition, and then separated the various word usages for training exclusive models for each accent. Besides, since the collected Hakka corpora are relatively limited, the unseen words frequently occurred during real-world translation. In our system, we selected suitable thresholds for each model based on the model verification to reject non-suitable translated words. Then, by applying the proposed algorithm, which adopted the forced Hakka idioms/terms segmentation and the common Mandarin word substitution, the resultant translation sentences become more intelligible. Therefore, the proposed system achieved promising results using small-sized data. This system could be used for Hakka language teaching and also the front-end of Mandarin and Hakka code-switching speech synthesis systems.

pdf abs
NCU1415 at ROCLING 2022 Shared Task: A light-weight transformer-based approach for Biomedical Name Entity Recognition
Zhi-Quan Feng | Po-Kai Chen | Jia-Ching Wang

Name Entity Recognition (NER) is a very important and basic task in traditional NLP tasks. In the biomedical field, NER tasks have been widely used in various products developed by various manufacturers. These include parsing, QA system, key information extraction or replacement in dialogue systems, and the practical application of knowledge parsing. In different fields, including bio-medicine, communication technology, e-commerce etc., NER technology is needed to identify drugs, diseases, commodities and other objects. This implementation focuses on the CLING 2022 SHARED TASK’s(Lee et al. 2022) NER TASK in biomedical field, with a bit of tuning and experimentation based on the language models.

pdf abs
CrowNER at Rocling 2022 Shared Task: NER using MacBERT and Adversarial Training
Qiu-Xia Zhang | Te-Yu Chi | Te-Lun Yang | Jyh-Shing Roger Jang

This study uses training and validation data from the “ROCLING 2022 Chinese Health Care Named Entity Recognition Task” for modeling. The modeling process adopts technologies such as data augmentation and data post-processing, and uses the MacBERT pre-training model to build a dedicated Chinese medical field NER recognizer. During the fine-tuning process, we also added adversarial training methods, such as FGM and PGD, and the results of the final tuned model were close to the best team for task evaluation. In addition, by introducing mixed-precision training, we also greatly reduce the time cost of training.

pdf abs
SCU-MESCLab at ROCLING-2022 Shared Task: Named Entity Recognition Using BERT Classifier
Tsung-Hsien Yang | Ruei-Cyuan Su | Tzu-En Su | Sing-Seong Chong | Ming-Hsiang Su

In this study, named entity recognition is constructed and applied in the medical domain. Data is labeled in BIO format. For example, “muscle” would be labeled “B-BODY” and “I-BODY”, and “cough” would be “B-SYMP” and “I-SYMP”. All words outside the category are marked with “O”. The Chinese HealthNER Corpus contains 30,692 sentences, of which 2531 sentences are divided into the validation set (dev) for this evaluation, and the conference finally provides another 3204 sentences for the test set (test). We use BLSTM_CRF, Roberta+BLSTM_CRF and BERT Classifier to submit three prediction results respectively. Finally, the BERT Classifier system submitted as RUN3 achieved the best prediction performance, with an accuracy of 80.18%, a recall rate of 78.3%, and an F1-score of 79.23.

pdf abs
YNU-HPCC at ROCLING 2022 Shared Task: A Transformer-based Model with Focal Loss and Regularization Dropout for Chinese Healthcare Named Entity Recognition
Xiang Luo | Jin Wang | Xuejie Zhang

Named Entity Recognition (NER) is a fundamental task in information extraction that locates the mentions of named entities and classifies them in unstructured texts. Previous studies typically used hidden Markov model (HMM) and conditional random fields (CRF) for NER. To learn long-distance dependencies in text, recurrent neural networks, e.g., LSTM and GRU can extract the semantic features for each token with a sequential manner. Based on Transformers, this paper describes the contribution to ROCLING-2022 Share Task. This paper adopts a transformer-based model with focal Loss and regularization dropout. The focal loss is to overcome the uneven distribution of the label. The regularization dropout (r-drop) is to address the problem of vocabulary and descriptions that are too domain-specific. The ensemble learning is to improve the performance of the model. Comparative experiments were conducted on dev set to select the model with the best performance for submission. That is, BERT model with BiLSTM-CRF, focal loss and R-Drop has achieved the best F1-score of 0.7768 and rank the 4th place.

pdf abs
NERVE at ROCLING 2022 Shared Task: A Comparison of Three Named Entity Recognition Frameworks Based on Language Model and Lexicon Approach
Bo-Shau Lin | Jian-He Chen | Tao-Hsing Chang

ROCLING 2022 shared task is to design a method that can tag medical entities in sentences and then classify them into categories through an algorithm. This paper proposes three models to deal with NER issues. The first is a BERT model combined with a classifier. The second is a two-stage model, where the first stage is to use a BERT model combined with a classifier for detecting whether medical entities exist in a sentence, and the second stage focuses on classifying the entities into categories. The third approach is to combine the first two models and a model based on the lexicon approach, integrating the outputs of the three models and making predictions. The prediction results of the three models for the validation and testing datasets show little difference in the performance of the three models, with the best performance on the F1 indicator being 0.7569 for the first model.

pdf abs
SCU-NLP at ROCLING 2022 Shared Task: Experiment and Error Analysis of Biomedical Entity Detection Model
Sung-Ting Chiou | Sheng-Wei Huang | Ying-Chun Lo | Yu-Hsuan Wu | Jheng-Long Wu

Named entity recognition generally refers to entities with specific meanings in unstructured text, including names of people, places, organizations, dates, times, quantities, proper nouns and other words. In the medical field, it may be drug names, Organ names, test items, nutritional supplements, etc. The purpose of named entity recognition in this study is to search for the above items from unstructured input text. In this study, taking healthcare as the research purpose, and predicting named entity boundaries and categories of sentences based on ten entity types, We explore multiple fundamental NER approaches to solve this task, Include: Hidden Markov Models, Conditional Random Fields, Random Forest Classifier and BERT. The prediction results are more significant in the F-score of the CRF model, and have achieved better results.

pdf abs
MIGBaseline at ROCLING 2022 Shared Task: Report on Named Entity Recognition Using Chinese Healthcare Datasets
Hsing-Yuan Ma | Wei-Jie Li | Chao-Lin Liu

Named Entity Recognition (NER) tools have been in development for years, yet few have been aimed at medical documents. The increasing needs for analyzing medical data makes it crucial to build a sophisticated NER model for this missing area. In this paper, W2NER, the state-of-the-art NER model, which has excelled in English and Chinese tasks, is run through selected inputs, several pretrained language models, and training strategies. The objective was to build an NER model suitable for healthcare corpora in Chinese. The best model managed to achieve an F1 score at 81.93%, which ranked first in the ROCLING 2022 shared task.

pdf abs
Overview of the ROCLING 2022 Shared Task for Chinese Healthcare Named Entity Recognition
Lung-Hao Lee | Chao-Yi Chen | Liang-Chih Yu | Yuen-Hsien Tseng

This paper describes the ROCLING-2022 shared task for Chinese healthcare named entity recognition, including task description, data preparation, performance metrics, and evaluation results. Among ten registered teams, seven participating teams submitted a total of 20 runs. This shared task reveals present NLP techniques for dealing with Chinese named entity recognition in the healthcare domain. All data sets with gold standards and evaluation scripts used in this shared task are publicly available for future research.