This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Conference on Computational Linguistics and Speech Processing (2021)
Modern approaches to Constituency Parsing are mono-lingual supervised approaches which require large amount of labelled data to be trained on, thus limiting their utility to only a handful of high-resource languages. To address this issue of data-sparsity for low-resource languages we propose Universal Recurrent Neural Network Grammars (UniRNNG) which is a multi-lingual variant of the popular Recurrent Neural Network Grammars (RNNG) model for constituency parsing. UniRNNG involves Cross-lingual Transfer Learning for Constituency Parsing task. The architecture of UniRNNG is inspired by Principle and Parameter theory proposed by Noam Chomsky. UniRNNG utilises the linguistic typology knowledge available as feature-values within WALS database, to generalize over multiple languages. Once trained on sufficiently diverse polyglot corpus UniRNNG can be applied to any natural language thus making it Language-agnostic constituency parser. Experiments reveal that our proposed UniRNNG outperform state-of-the-art baseline approaches for most of the target languages, for which these are tested.
The explosive growth of music libraries has made music information retrieval and recommendation a critical issue. Recommendation systems based on music emotion recognition are gradually gaining attention. Most of the studies focus on audio data rather than lyrics to build models of music emotion classification. In addition, because of the richness of English language resources, most of the existing studies are focused on English lyrics but rarely on Chinese. For this reason, We propose an approach that uses the BERT pretraining model and Transfer learning to improve the emotion classification task of Chinese lyrics. The following approaches were used without any specific training for the Chinese lyrics emotional classification task: (a) Using BERT, only can reach 50% of the classification accuracy. (b) Using BERT with transfer learning of CVAW, CVAP, and CVAT datasets can achieve 71% classification accuracy.
This study presents a novel QA-based sequence labeling (QASL) approach to naturally tackle both flat and nested Named Entity Recogntion (NER) tasks on a Chinese Electronic Health Records (CEHRs) dataset. This proposed QASL approach parallelly asks a corresponding natural language question for each specific named entity type, and then identifies those associated NEs of the same specified type with the BIO tagging scheme. The associated nested NEs are then formed by overlapping the results of various types. In comparison with those pure sequence-labeling (SL) approaches, since the given question includes significant prior knowledge about the specified entity type and the capability of extracting NEs with different types, the performance for nested NER task is thus improved, obtaining 90.70% of F1-score. Besides, in comparison with the pure QA-based approach, our proposed approach retains the SL features, which could extract multiple NEs with the same types without knowing the exact number of NEs in the same passage in advance. Eventually, experiments on our CEHR dataset demonstrate that QASL-based models greatly outperform the SL-based models by 6.12% to 7.14% of F1-score.
Information extraction is a core technology of natural language processing, which extracts some meaningful phrases/clauses from unstructured or semistructured content to a particular topic. It can be said to be the core technology of many language technologies and applications. This paper introduces AI Clerk Platform, which aims to accelerate and improve the entire process and convenience of the development of information extraction tools. AI Clerk Platform provides a friendly and intuitive visualized manual labeling interface, sets suitable semantic label in need, and implements, distributes and controls manual labeling tasks, so that users can complete customized information extraction models without programming and view the automatically predict results of models by three method. AI Clerk Platform further assists in the development of other natural language processing technologies and the derivation of application services.
This paper presents a framework to answer the questions that require various kinds of inference mechanisms (such as Extraction, Entailment-Judgement, and Summarization). Most of the previous approaches adopt a rigid framework which handles only one inference mechanism. Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks. To alleviate the problems mentioned above, we propose a divide-and-conquer framework, which consists of a set of various answer generation modules, a dispatch module, and an aggregation module. The answer generation modules are designed to provide different inference mechanisms, the dispatch module is used to select a few appropriate answer generation modules to generate answer candidates, and the aggregation module is employed to select the final answer. We test our framework on the 2020 Formosa Grand Challenge Contest dataset. Experiments show that the proposed framework outperforms the state-of-the-art Roberta-large model by about 11.4%.
Due to the popularity of intelligent dialogue assistant services, speech emotion recognition has become more and more important. In the communication between humans and machines, emotion recognition and emotion analysis can enhance the interaction between machines and humans. This study uses the CNN+LSTM model to implement speech emotion recognition (SER) processing and prediction. From the experimental results, it is known that using the CNN+LSTM model achieves better performance than using the traditional NN model.
With the recent breakthrough of deep learning technologies, research on machine reading comprehension (MRC) has attracted much attention and found its versatile applications in many use cases. MRC is an important natural language processing (NLP) task aiming to assess the ability of a machine to understand natural language expressions, which is typically operationalized by first asking questions based on a given text paragraph and then receiving machine-generated answers in accordance with the given context paragraph and questions. In this paper, we leverage two novel pretrained language models built on top of Bidirectional Encoder Representations from Transformers (BERT), namely BERT-wwm and MacBERT, to develop effective MRC methods. In addition, we also seek to investigate whether additional incorporation of the categorical information about a context paragraph can benefit MRC or not, which is achieved based on performing context paragraph clustering on the training dataset. On the other hand, an ensemble learning approach is proposed to harness the synergistic power of the aforementioned two BERT-based models so as to further promote MRC performance.
As the average life expectancy of Chinese people rises, the health care problems of the elderly are becoming more diverse, and the demand for long-term care is also increasing. Therefore, how to help the elderly have a good quality of life and maintain their dignity is what we need to think about. This research intends to explore the characteristics of natural language of normal aging people through a deep model. First, we collect information through focus groups so that the elders can naturally interact with other participants in the process. Then, through the word vector model and regression model, an executive function prediction model based on dialogue data is established to help understand the degradation trajectory of executive function and establish an early warning.
Automatic Speech Recognition (ASR) technology presents the possibility for medical professionals to document patient record, diagnosis, postoperative care, patrol records, and etc. that are now done manually. However, earlier research aimed on Chinese medical speech corpus (ChiMeS) has two shortcomings: first is the lack of punctuation, resulting in reduced readability of the output transcript, and second is the poor recognition error rate, affecting its application put to the fields. Accordingly, the contributions of this paper consist of: (1) A punctuated Chinese medical corpus psChiMeS-14 newly annotated from ChiMeS-14, which is the collection of 516 anonymized medical record readouts of 867 minutes long, recorded by 15 professional nursing staff from Taipei Hospital of the Ministry of Health and Welfare. psChiMeS-14 is manually punctuated with: colons, commas, and periods, ready for general end-to-end ASR models. (2) A self-attention based speech recognition solution by conformer networks. Trained by and tested on psChiMeS-14 corpus, the solutions deliver state-of-the-art recognition performance: CER (character error rate) 10.5%, and KER (Keyword error rate) of 13.10%, respectively, which is contrasted to the 15.70% CER and the 22.50% KER by an earlier reported Joint CTC/Attention architecture.
Concerning the development of Chinese medical speech recognition technology, this study re-addresses earlier encountered issues in accordance with the process of Machine Learning Engineering for Production (MLOps) from a data centric perspective. First is the new segmentation of speech utterances to meet sentences completeness for all utterances in the collected Chinese Medical Speech Corpus (ChiMeS). Second is optimization of Joint CTC/Attention model through data augmentation in boosting recognition performance out of very limited speech corpus. Overall, to facilitate the development of Chinese medical speech recognition, this paper contributes: (1) The ChiMeS corpus, the first Chinese Medicine Speech corpus of its kind, which is 14.4 hours, with a total of 7,225 sentences. (2) A trained Joint CTC/Attention ASR model by ChiMeS-14, yielding a Character Error Rate (CER) of 13.65% and a Keyword Error Rate (KER) of 20.82%, respectively, when tested on the ChiMeS-14 testing set. And (3) an evaluation platform set up to compare performance of other ASR models. All the released resources can be found in the ChiMeS portal (https://iclab.ee.ntust.edu.tw/home).
In this paper, we investigate how to use limited code-switching data to implement a code-switching speech recognition system. We utilize the Transformer end-to-end model to develop our code switching speech recognition system, which is trained with the Mandarin dataset and a small amount of Mandarin-English code switching dataset, as the baseline of this paper. Next, we compare the performance of systems after adding multi-task learning and transfer learning. Character Error Rate(CER) is adopted as the criterion for the system. Finally, we combined the three systems with the language model, respectively, our best result dropped to 23.9% compared with the baseline of 28.7%.
In this paper, we use domain generalization to improve the performance of the cross-device speaker verification system. Based on a trainable speaker verification system, we use domain generalization algorithms to fine-tune the model parameters. First, we use the VoxCeleb2 dataset to train ECAPA-TDNN as a baseline model. Then, use the CHT-TDSV dataset and the following domain generalization algorithms to fine-tune it: DANN, CDNN, Deep CORAL. Our proposed system tests 10 different scenarios in the NSYSU-TDSV dataset, including a single device and multiple devices. Finally, in the scenario of multiple devices, the best equal error rate decreased from 18.39 in the baseline to 8.84. Successfully achieved cross-device identification on the speaker verification system.
Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.
With the widespread commercialization of smart devices, research on environmental sound classification has gained more and more attention in recent years. In this paper, we set out to make effective use of large-scale audio pretrained model and semi-supervised model training paradigm for environmental sound classification. To this end, an environmental sound classification method is first put forward, whose component model is built on top a large-scale audio pretrained model. Further, to simulate a low-resource sound classification setting where only limited supervised examples are made available, we instantiate the notion of transfer learning with a recently proposed training algorithm (namely, FixMatch) and a data augmentation method (namely, SpecAugment) to achieve the goal of semi-supervised model training. Experiments conducted on bench-mark dataset UrbanSound8K reveal that our classification method can lead to an accuracy improvement of 2.4% in relation to a current baseline method.
Current neural math solvers learn to incorporate commonsense or domain knowledge by utilizing pre-specified constants or formulas. However, as these constants and formulas are mainly human-specified, the generalizability of the solvers is limited. In this paper, we propose to explicitly retrieve the required knowledge from math problemdatasets. In this way, we can determinedly characterize the required knowledge andimprove the explainability of solvers. Our two algorithms take the problem text andthe solution equations as input. Then, they try to deduce the required commonsense and domain knowledge by integrating information from both parts. We construct two math datasets and show the effectiveness of our algorithms that they can retrieve the required knowledge for problem-solving.
Sound event detection (SED) system outputs sound events and their time boundaries in audio signals. We proposed an RCRNN-based SED system with residual connection and convolution block attention mechanism based on the mean-teacher framework of semi-supervised learning. The neural network can be trained with an amount of weakly labeled data and unlabeled data. In addition, we consider that the speech event has more information than other sound events. Thus, we use the specific time-frequency resolution to extract the acoustic feature of the speech event. Furthermore, we apply data augmentation and post-processing to improve the performance. On the DCASE 2021 Task 4 validation set, the proposed system achieves the PSDS (Poly-phonic Sound Event Detection Score)-scenario 2 of 57.6% and event-based F1-score of 41.6%, outperforming the baseline score of 52.7% and 40.7%.
There has been increasing demand to develop effective computer-assisted language training (CAPT) systems, which can provide feedback on mispronunciations and facilitate second-language (L2) learners to improve their speaking proficiency through repeated practice. Due to the shortage of non-native speech for training the automatic speech recognition (ASR) module of a CAPT system, the corresponding mispronunciation detection performance is often affected by imperfect ASR. Recognizing this importance, we in this paper put forward a two-stage mispronunciation detection method. In the first stage, the speech uttered by an L2 learner is processed by an end-to-end ASR module to produce N-best phone sequence hypotheses. In the second stage, these hypotheses are fed into a pronunciation model which seeks to faithfully predict the phone sequence hypothesis that is most likely pronounced by the learner, so as to improve the performance of mispronunciation detection. Empirical experiments conducted a English benchmark dataset seem to confirm the utility of our method.
This study recruited 51 elders aged 53-74 to discuss their daily activities in focus groups. The transcribed discourse was analyzed using the Chinese version of LIWC (Lin et al., 2020; Pennebaker et al., 2015) for cognitive complexity and dynamic language as well as content words related to elders’ daily activities. The interruption behavior during the conversation was also coded and analyzed. After controlling for education, gender and age, the results showed that cognitive flexibility performance was accompanied by the increasing adoption of dynamic language, insight words and family words. These findings serve as the basis for predicting elders’ cognitive flexibility through their daily language use.
In recent years, dialogue system is booming and widely used in customer service system, and has achieved good results. Viewing the conversation records between users and real customer service, we can see that the user’s sentences are mixed with questions about products and services, and chat with customer service. According to the experience of professionals, it is helpful in improving the user experience to mix some chats in customer service conversations. However, users’ questions are expected to be answered, while chatting is expected to interact with customer service. In order to produce an appropriate response, the dialogue system must be able to distinguish these two intentions effectively. Dialog act is a classification that linguists define according to its function. We think this information will help distinguishing questioning sentences and chatting sentences. In this paper, we combine a published COVID-19 QA dataset and a COVID-19-topic chat dataset to form our experimental data. Based on the BERT (Bidirectional Encoder Representation from Transformers) model, we build a question-chat classifier model. The experimental results show that the accuracy of the configuration with dialog act embedding is 16% higher than that with only original statement embedding. In addition, it is found that conversation behavior types such as “Statement-non-opinion”, “Signal-non-understanding” and “Appreciation” are more related to question sentences, while “Wh-Question”, “Yes-No-Question” and “Rhetorical-Question” questions are more related to chat sentences.
Voice-driven communication aids are one of the methods commonly used by patients with dysarthria. However, this type of assistive devices demands a large amount of voice data from patients to increase the effectiveness. In the meantime, this will sink patients into an overwhelming recording burden. Due to those difficulties, this research proposes a voice augmentation system to conquer the aforementioned concern. Furthermore, the system can improve the recognition efficiency. The results of this research reveal that the proposed speech generator system for dysarthria can launch corpus to be more similarities to the patient’s speech. Moreover, the recognition rate, in duplicate sentences, has been improved and promoted to the higher level. The word error rate can be reduced from 64.42% to 4.39% in the case of patients with Free-talk. According to these results, our proposed system can provide more reliable and helpful technique for the development of communication aids.
To provide analysis of recent researches of automatic question generation from text,we surveyed 9 papers between 2019 to early 2021, retrieved from Paper with Code(PwC). Our research follows the survey reported by Kurdi et al.(2020), in which analysis of 93 papers from 2014 to early2019 are provided. We analyzed the 9papers from aspects including: (1) purpose of question generation, (2) generation method, and (3) evaluation. We found that recent approaches tend to rely on semantic information and Transformer-based models are attracting increasing interest since they are more efficient. On the other hand,since there isn’t any widely acknowledged automatic evaluation metric designed for question generation, researchers adopt metrics of other natural language processing tasks to compare different systems.
Due to the development of deep learning, the natural language processing tasks have made great progresses by leveraging the bidirectional encoder representations from Transformers (BERT). The goal of information retrieval is to search the most relevant results for the user’s query from a large set of documents. Although BERT-based retrieval models have shown excellent results in many studies, these models usually suffer from the need for large amounts of computations and/or additional storage spaces. In view of the flaws, a BERT-based Siamese-structured retrieval model (BESS) is proposed in this paper. BESS not only inherits the merits of pre-trained language models, but also can generate extra information to compensate the original query automatically. Besides, the reinforcement learning strategy is introduced to make the model more robust. Accordingly, we evaluate BESS on three public-available corpora, and the experimental results demonstrate the efficiency of the proposed retrieval model.
Aspect Category Sentiment Analysis (ACSA), which aims to identify fine-grained sentiment polarities of the aspect categories discussed in user reviews. ACSA is challenging and costly when conducting it into real-world applications, that mainly due to the following reasons: 1.) Labeling the fine-grained ACSA data is often labor-intensive. 2.) The aspect categories will be dynamically updated and adjusted with the development of application scenarios, which means that the data must be relabeled frequently. 3.) Due to the increase of aspect categories, the model must be retrained frequently to fast adapt to the newly added aspect category data. To overcome the above-mentioned problems, we introduce a novel Meta Multi-Task Learning (MMTL) approach, that frame ACSA tasks as a meta-learning problem (i.e., regarding aspect-category sentiment polarity classification problems as the different training tasks for meta-learning) to learn an ideal and shareable initialization for the multi-task learning model that can be adapted to new ACSA tasks efficiently and effectively. Experiment results show that the proposed approach significantly outperforms the strong pre-trained transformer-based baseline model, especially, in the case of less labeled fine-grained training data.
For manufacturers of home appliances, the Studying discussion of products on social media can help manufacturers improve their products. Opinions provided through online reviews can immediately reflect whether the product is accepted by people, and which aspect of the product are most discussed . In this article, we divide the analysis of home appliances into three tasks, including named entity recognition (NER), aspect category extraction (ACE), and aspect category sentiment classification (ACSC). To improve the performance of ACSC, we combine the Reptile algorithm in meta learning with the concept of domain adversarial training to form the concept of the Adversarial Reptile algorithm. We find show that the macro-f1 is improved from 68.6% (BERT fine tuned model) to 70.3% (p-value 0.04).
Information overload has been one of the challenges regarding information from the Internet. It is not a matter of information access, instead, the focus had shifted towards the quality of the retrieved data. Particularly in the news domain, multiple outlets report on the same news events but may differ in details. This work considers that different news outlets are more likely to differ in their writing styles and the choice of words, and proposes a method to extract sentences based on their key information by focusing on the shared synonyms in each sentence. Our method also attempts to reduce redundancy through hierarchical clustering and arrange selected sentences on the proposed orderBERT. The results show that the proposed unsupervised framework successfully improves the coverage, coherence, and, meanwhile, reduces the redundancy for a generated summary. Moreover, due to the process of obtaining the dataset, we also propose a data refinement method to alleviate the problems of undesirable texts, which result from the process of automatic scraping.
When we are interested in a certain domain, we can collect and analyze data from the Internet. The newly collected data is not labeled, so the use of labeled data is hoped to be helpful to the new data. We perform name entity recognition (NER) and aspect-based sentiment analysis (ABSA) in multi-task learning, and combine parameter generation network and DANN architecture to build the model. In the NER task, the data is labeled with Tie, Break, and the task weight is adjusted according to the loss change rate of each task using Dynamic Weight Average (DWA). This study used two different source domain data sets. The experimental results show that Tie, Break can improve the results of the model; DWA can have better performance in the results; the combination of parameter generation network and gradient reversal layer can be used for every good learning in different domain.
With the popularity of the current Internet age, online social platforms have provided a bridge for communication between private companies, public organizations, and the public. The purpose of this research is to understand the user’s experience of the product by analyzing product review data in different fields. We propose a BiLSTM-based neural network which infused rich emotional information. In addition to consider Valence and Arousal which is the smallest morpheme of emotional information, the dependence relationship between texts is also integrated into the deep learning model to analyze the sentiment. The experimental results show that this research can achieve good performance in predicting the vocabulary Valence and Arousal. In addition, the integration of VA and dependency information into the BiLSTM model can have excellent performance for social text sentiment analysis, which verifies that this model is effective in emotion recognition of social medial short text.
Machine learning methods for financial document analysis have been focusing mainly on the textual part. However, the numerical parts of these documents are also rich in information content. In order to further analyze the financial text, we should assay the numeric information in depth. In light of this, the purpose of this research is to identify the linking between the target cashtag and the target numeral in financial tweets, which is more challenging than analyzing news and official documents. In this research, we developed a multi model fusion approach which integrates Bidirectional Encoder Representations from Transformers (BERT) and Convolutional Neural Network (CNN). We also encode dependency information behind text into the model to derive semantic latent features. The experimental results show that our model can achieve remarkable performance and outperform comparisons.
Conventional opinion polls were usually conducted via questionnaires or phone interviews, which are time-consuming and error-prone. With the advances in social networking platforms, it’s easier for the general public to express their opinions on popular topics. Given the huge amount of user opinions, it would be useful if we can automatically collect and aggregate the overall topical stance for a specific topic. In this paper, we propose to predict topical stances from social media by concept expansion, sentiment classification, and stance aggregation based on word embeddings. For concept expansion of a given topic, related posts are collected from social media and clustered by word embeddings. Then, major keywords are extracted by word segmentation and named entity recognition methods. For sentiment classification and aggregation, machine learning methods are used to train sentiment lexicon with word embeddings. Then, the sentiment scores from user-centric and post-centric views are aggregated as the total stance on the topic. In the experiments, we evaluated the performance of our proposed approach using social media data from online forums. The experimental results for 2016 Taiwan Presidential Election showed that our proposed method can effectively expand keywords and aggregate topical stances from the public for accurate prediction of election results. The best performance is 0.52% in terms of mean absolute error (MAE). Further investigation is needed to evaluate the performance of the proposed method in larger scales.
The masking-based speech enhancement method pursues a multiplicative mask that applies to the spectrogram of input noise-corrupted utterance, and a deep neural network (DNN) is often used to learn the mask. In particular, the features commonly used for automatic speech recognition can serve as the input of the DNN to learn the well-behaved mask that significantly reduce the noise distortion of processed utterances. This study proposes to preprocess the input speech features for the ideal ratio mask (IRM)-based DNN by lowpass filtering in order to alleviate the noise components. In particular, we employ the discrete wavelet transform (DWT) to decompose the temporal speech feature sequence and scale down the detail coefficients, which correspond to the high-pass portion of the sequence. Preliminary experiments conducted on a subset of TIMIT corpus reveal that the proposed method can make the resulting IRM achieve higher speech quality and intelligibility for the babble noise-corrupted signals compared with the original IRM, indicating that the lowpass filtered temporal feature sequence can learn a superior IRM network for speech enhancement.
Nowadays, there are a lot of advertisements hiding as normal posts or experience sharing in social media. There is little research of advertorial detection on Mandarin Chinese texts. This paper thus aimed to focus on hidden advertorial detection of online posts in Taiwan Mandarin Chinese. We inspected seven contextual features based on linguistic theories in discourse level. These features can be further grouped into three schemas under the general advertorial writing structure. We further implemented these features to train a multi-task BERT model to detect advertorials. The results suggested that specific linguistic features would help extract advertorials.
We introduce a method for generating error-correction rules for grammar pattern errors in a given annotated learner corpus. In our approach, annotated edits in the learner corpus are converted into edit rules for correcting common writing errors. The method involves automatic extraction of grammar patterns, and automatic alignment of the erroneous patterns and correct patterns. At run-time, grammar patterns are extracted from the grammatically correct sentences, and correction rules are retrieved by aligning the extracted grammar patterns with the erroneous patterns. Using the proposed method, we generate 1,499 high-quality correction rules related to 232 headwords. The method can be used to assist ESL students in avoiding grammatical errors, and aid teachers in correcting students’ essays. Additionally, the method can be used in the compilation of collocation error dictionaries and the construction of grammar error correction systems.
We use Hypergraph Attention Networks (HyperGAT) to recognize multiple labels of Chinese humor texts. We firstly represent a joke as a hypergraph. The sequential hyperedge and semantic hyperedge structures are used to construct hyperedges. Then, attention mechanisms are adopted to aggregate context information embedded in nodes and hyperedges. Finally, we use trained HyperGAT to complete the multi-label classification task. Experimental results on the Chinese humor multi-label dataset showed that HyperGAT model outperforms previous sequence-based (CNN, BiLSTM, FastText) and graph-based (Graph-CNN, TextGCN, Text Level GNN) deep learning models.
In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked knowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.
Ever-expanding evaluative texts on online forums have become an important source of sentiment analysis. This paper proposes an aspect-based annotated dataset consisting of telecom reviews on social media. We introduce a category, implicit evaluative texts, impevals for short, to investigate how the deep learning model works on these implicit reviews. We first compare two models, BertSimple and BertImpvl, and find that while both models are competent to learn simple evaluative texts, they are confused when classifying impevals. To investigate the factors underlying the correctness of the model’s predictions, we conduct a series of analyses, including qualitative error analysis and quantitative analysis of linguistic features with logistic regressions. The results show that local features that affect the overall sentential sentiment confuse the model: multiple target entities, transitional words, sarcasm, and rhetorical questions. Crucially, these linguistic features are independent of the model’s confidence measured by the classifier’s softmax probabilities. Interestingly, the sentence complexity indicated by syntax-tree depth is not correlated with the model’s correctness. In sum, this paper sheds light on the characteristics of the modern deep learning model and when it might need more supervision through linguistic evaluations.
We propose the mixed-attention-based Generative Adversarial Network (named maGAN), and apply it for citation intent classification in scientific publication. We select domain-specific training data, propose a mixed-attention mechanism, and employ generative adversarial network architecture for pre-training language model and fine-tuning to the downstream multi-class classification task. Experiments were conducted on the SciCite datasets to compare model performance. Our proposed maGAN model achieved the best Macro-F1 of 0.8532.
The streaming service platform such as YouTube provides a discussion function for audiences worldwide to share comments. YouTubers who upload videos to the YouTube platform want to track the performance of these uploaded videos. However, the present analysis functions of YouTube only provide a few performance indicators such as average view duration, browsing history, variance in audience’s demographics, etc., and lack of sentiment analysis on the audience’s comments. Therefore, the paper proposes multi-dimensional sentiment indicators such as YouTuber preference, Video preferences, and Excitement level to capture comprehensive sentiment on audience comments for videos and YouTubers. To evaluate the performance of different classifiers, we experiment with deep learning-based, machine learning-based, and BERT-based classifiers to automatically detect three sentiment indicators of an audience’s comments. Experimental results indicate that the BERT-based classifier is a better classification model than other classifiers according to F1-score, and the sentiment indicator of Excitement level is quite an improvement. Therefore, the multiple sentiment detection tasks on the video streaming service platform can be solved by the proposed multi-dimensional sentiment indicators accompanied with BERT classifier to gain the best result.
Aging populations have posed a challenge to many countries including Taiwan, and with them come the issue of long-term care. Given the current context, the aim of this study was to explore the hotly-discussed subtopics in the field of long-term care, and identify its features through NLP. This study applied TF-IDF, the Logistic Regression model, and the Naive Bayes classifier to process data. In sum, the results showed that it reached a best F1-score of 0.920 in identification, and a best accuracy of 0.708 in classification. The results of this study could be used as a reference for future long-term care related applications.
We introduce a method for assisting English as Second Language (ESL) learners by providing translations of Collins COBUILD grammar patterns(GP) for a given word. In our approach, bilingual parallel corpus is transformed into bilingual GP pairs aimed at providing native language support for learning word usage through GPs. The method involves automatically parsing sentences to extract GPs, automatically generating translation GP pairs from bilingual sentences, and automatically extracting common bilingual GPs. At run-time, the target word is used for lookup GPs and translations, and the retrieved common GPs and their example sentences are shown to the user. We present a prototype phrase search engine, Linggle GPTrans, that implements the methods to assist ESL learners. Preliminary evaluation on a set of more than 300 GP-translation pairs shows that the methods achieve 91% accuracy.
The rapid flow of information and the abundance of text data on the Internet have brought about the urgent demand for the construction of monitoring resources and techniques used for various purposes. To extract facets of information useful for particular domains from such large and dynamically growing corpora requires an unsupervised yet transparent ways of analyzing the textual data. This paper proposed a hybrid collocation analysis as a potential method to retrieve and summarize Taiwan-related topics posted on Weibo and PTT. By grouping collocates of 臺灣 ‘Taiwan’ into clusters of topics via either word embeddings clustering or Latent Dirichlet allocation, lists of collocates can be converted to probability distributions such that distances and similarities can be defined and computed. With this method, we conduct a diachronic analysis of the similarity between Weibo and PTT, providing a way to pinpoint when and how the topic similarity between the two rises or falls. A fine-grained view on the grammatical behavior and political implications is attempted, too. This study thus sheds light on alternative explainable routes for future social media listening method on the understanding of cross-strait relationship.
As the system of confiscation becomes more and more perfect, grasping the distribution of the types of confiscations actually announced by the courts will enable you to understand changing of the trend. In addition to assisting legislators in formulating laws, it can also provide other people with an understanding of the actual operation of the confiscation system. In order to enable artificial intelligence technology to automatically identify the distribution of confiscation, and consumes a lot of manpower and time costs of manual judgment. The purpose of this research is to establish an automated confiscation identification model that can quickly and accurately identify the multiple label categories of confiscation, and provide the needs of all social circles for confiscation information, so as to facilitate subsequent law amendments or discretion. This research uses the first instance criminal cases as the main experimental data. According to the current laws, the confiscation is divided into three categories: contrabands, criminal tools and criminal proceeds, and perform multiple label identification. This research will use Term Frequency–Inverse Document Frequency (TF-IDF) and Word2Vec algorithm as the feature extraction algorithm, with random forest classifier, and CKIPlabBERT pretrained model for training and identification. The experimental results show that under the CKIPlabBERT pretrained model, the best identification effect can be obtained when only use sentences with confiscated words mentioned in the judgment. When the task is case confiscation, the Micro F1 Score can be as high as 96.2716%, and when the task is defendant confiscation, the Micro F1 Score is as high as 95.5478%.
In recent years, speech synthesis system can generate speech with high speech quality. However, multi-speaker text-to-speech (TTS) system still require large amount of speech data for each target speaker. In this study, we would like to construct a multi-speaker TTS system by incorporating two sub modules into artificial neural network-based speech synthesis system to alleviate this problem. First module is to add speaker embedding into encoding module for generating speech while a large amount of the speech data from target speaker is not necessary. For speaker embedding method, in our study, two main speaker embedding methods, namely speaker verification embedding and voice conversion embedding, are compared to deciding which one is suitable for our personalized TTS system. Second, we substituted the conventional post-net module, which is adopted to enhance the output spectrum sequence, to further improving the speech quality of the generated speech utterance. Here, a post-filter network is used. Finally, experiment results showed that the speaker embedding is useful by adding it into encoding module and the resultant speech utterance indeed perceived as the target speaker. Also, the post-filter network not only improving the speech quality and also enhancing the speaker similarity of the generated speech utterances. The constructed TTS system can generate a speech utterance of the target speaker in fewer than 2 seconds. In the future, we would like to further investigate the controllability of the speaking rate or perceived emotion state of the generated speech.
This paper presents a method for automatically identifying bilingual grammar patterns and extracting bilingual phrase instances from a given English-Chinese sentence pair. In our approach, the English-Chinese sentence pair is parsed to identify English grammar patterns and Chinese counterparts. The method involves generating translations of each English grammar pattern and calculating translation probability of words from a word-aligned parallel corpora. The results allow us to extract the most probable English-Chinese phrase pairs in the sentence pair. We present a prototype system that applies the method to extract grammar patterns and phrases in parallel sentences. An evaluation on randomly selected examples from a dictionary shows that our approach has reasonably good performance. We use human judge to assess the bilingual phrases generated by our approach. The results have potential to assist language learning and machine translation research.
We present a method for determining intended sense definitions of a given academic word in an academic keyword list. In our approach, the keyword list are converted into unigram of all possible Mandarin translations, intended or not. The method involve converting words in the keyword list into all translations using a bilingual dictionary, computing the unigram word counts of translations, and computing character counts from the word counts. At run-time, each definition (with associated translation) of the given word is scored with word and character counts, and the definition with the highest count is returned. We present a prototype system for the Academic Keyword List to generate definitions and translation for pedagogy purposes. We also experimented with clustering definition embeddings of all words and definitions, and identifying intended sense in favor of embedding in larger clusters. Preliminary evaluation shows promising performance. This endeavor is a step towards creating a full-fledged dictionary from an academic word list.
Sentiment analysis has become a popular research issue in recent years, especially on educational texts which is an important problem. According to literature, the similar sentence generation can help the prediction performance of machine learning. Therefore, the process of controlled expansional samples is a key component to prediction models. The paper proposed a sample expansion method which combined part-of-speech filter and similar word finder of Word2Vec. The generate samples have high quality with similar sentiment representation. The DistilBERT pretrained model is used to learn and predict Valence-Arousal scores from the expansion samples. Experimental result displays that the using the expansion samples as training data into prediction model has outperforms original training data without expansion, and obtains 80% mean square error reducing and 28% pearson correlation coefficient increasing.
This technical report aims at the ROCLING 2021 Shared Task: Dimensional Sentiment Analysis for Educational Texts. In order to predict the affective states of Chinese educational texts, we present a practical framework by employing pre-trained language models, such as BERT and MacBERT. Several valuable observations and analyses can be drawn from a series of experiments. From the results, we find that MacBERT-based methods can deliver better results than BERT-based methods on the verification set. Therefore, we average the prediction results of several models obtained using different settings as the final output.
In this paper, we proposed a BERT-based dimensional semantic analyzer, which is designed by incorporating with word-level information. Our model achieved three of the best results in four metrics on “ROCLING 2021 Shared Task: Dimensional Sentiment Analysis for Educational Texts”. We conducted a series of experiments to compare the effectiveness of different pre-trained methods. Besides, the results also proofed that our method can significantly improve the performances than classic methods. Based on the experiments, we also discussed the impact of model architectures and datasets.
This paper present a description for the ROCLING 2021 shared task in dimensional sentiment analysis for educational texts. We submitted two runs in the final test. Both runs use the standard regression model. The Run1 uses Chinese version of BERT as the base, and in Run2 we use the early version of MacBERT that Chinese version of RoBERTa-like BERT model, RoBERTa-wwm-ext. Using powerful pre-training model of BERT for text embedding to help train the model.
In this shared task, this paper proposes a method to combine the BERT-based word vector model and the LSTM prediction model to predict the Valence and Arousal values in the text. Among them, the BERT-based word vector is 768-dimensional, and each word vector in the sentence is sequentially fed to the LSTM model for prediction. The experimental results show that the performance of our proposed method is better than the results of the Lasso Regression model.
We use the MacBERT transformers and fine-tune them to ROCLING-2021 shared tasks using the CVAT and CVAS data. We compare the performance of MacBERT with the other two transformers BERT and RoBERTa in the valence and arousal dimensions, respectively. MAE and correlation coefficient (r) were used as evaluation metrics. On ROCLING-2021 test set, our used MacBERT model achieves 0.611 of MAE and 0.904 of r in the valence dimensions; and 0.938 of MAE and 0.549 of r in the arousal dimension.
This paper presents the ROCLING 2021 shared task on dimensional sentiment analysis for educational texts which seeks to identify a real-value sentiment score of self-evaluation comments written by Chinese students in the both valence and arousal dimensions. Valence represents the degree of pleasant and unpleasant (or positive and negative) feelings, and arousal represents the degree of excitement and calm. Of the 7 teams registered for this shared task for two-dimensional sentiment analysis, 6 submitted results. We expected that this evaluation campaign could produce more advanced dimensional sentiment analysis techniques for the educational domain. All data sets with gold standards and scoring script are made publicly available to researchers.