Hung-Yu Kao

Also published as: Hung-yu Kao

2024

pdf abs
0x.Yuan at SemEval-2024 Task 2: Agents Debating can reach consensus and produce better outcomes in Medical NLI task
Yu-an Lu | Hung-yu Kao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we introduce a multi-agent debating framework, experimenting on SemEval 2024 Task 2. This innovative system employs a collaborative approach involving expert agents from various medical fields to analyze Clinical Trial Reports (CTRs). Our methodology emphasizes nuanced and comprehensive analysis by leveraging the diverse expertise of agents like Biostatisticians and Medical Linguists. Results indicate that our collaborative model surpasses the performance of individual agents in terms of Macro F1-score. Additionally, our analysis suggests that while initial debates often mirror majority decisions, the debating process refines these outcomes, demonstrating the system’s capability for in-depth analysis beyond simple majority rule. This research highlights the potential of AI collaboration in specialized domains, particularly in medical text interpretation.

pdf abs
0x.Yuan at SemEval-2024 Task 5: Enhancing Legal Argument Reasoning with Structured Prompts
Yu-an Lu | Hung-yu Kao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The intersection of legal reasoning and Natural Language Processing (NLP) technologies, particularly Large Language Models (LLMs), offers groundbreaking potential for augmenting human capabilities in the legal domain. This paper presents our approach and findings from participating in SemEval-2024 Task 5, focusing on the effect of argument reasoning in civil procedures using legal reasoning prompts. We investigated the impact of structured legal reasoning methodologies, including TREACC, IRAC, IRAAC, and MIRAC, on guiding LLMs to analyze and evaluate legal arguments systematically. Our experimental setup involved crafting specific prompts based on these methodologies to instruct the LLM to dissect and scrutinize legal cases, aiming to discern the cogency of argumentative solutions within a zero-shot learning framework. The performance of our approach, as measured by F1 score and accuracy, demonstrated the efficacy of integrating structured legal reasoning into LLMs for legal analysis. The findings underscore the promise of LLMs, when equipped with legal reasoning prompts, in enhancing their ability to process and reason through complex legal texts, thus contributing to the broader application of AI in legal studies and practice.

2023

pdf abs
Advancing Multi-Criteria Chinese Word Segmentation Through Criterion Classification and Denoising
Tzu Hsuan Chou | Chun-Yi Lin | Hung-Yu Kao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent research on multi-criteria Chinese word segmentation (MCCWS) mainly focuses on building complex private structures, adding more handcrafted features, or introducing complex optimization processes. In this work, we show that through a simple yet elegant input-hint-based MCCWS model, we can achieve state-of-the-art (SoTA) performances on several datasets simultaneously. We further propose a novel criterion-denoising objective that hurts slightly on F1 score but achieves SoTA recall on out-of-vocabulary words. Our result establishes a simple yet strong baseline for future MCCWS research. Source code is available at https://github.com/IKMLab/MCCWS.

pdf abs
Breaking Boundaries in Retrieval Systems: Unsupervised Domain Adaptation with Denoise-Finetuning
Che Chen | Ching-Wen Yang | Chun-Yi Lin | Hung-Yu Kao
Findings of the Association for Computational Linguistics: EMNLP 2023

Dense retrieval models have exhibited remarkable effectiveness, but they rely on abundant labeled data and face challenges when applied to different domains. Previous domain adaptation methods have employed generative models to generate pseudo queries, creating pseudo datasets to enhance the performance of dense retrieval models. However, these approaches typically use unadapted rerank models, leading to potentially imprecise labels. In this paper, we demonstrate the significance of adapting the rerank model to the target domain prior to utilizing it for label generation. This adaptation process enables us to obtain more accurate labels, thereby improving the overall performance of the dense retrieval model. Additionally, by combining the adapted retrieval model with the adapted rerank model, we achieve significantly better domain adaptation results across three retrieval datasets. We release our code for future research.

Recent Chinese word segmentation (CWS) models have shown competitive performance with pre-trained language models’ knowledge. However, these models tend to learn the segmentation knowledge through in-vocabulary words rather than understanding the meaning of the entire context. To address this issue, we introduce a context-aware approach that incorporates unsupervised sentence representation learning over different dropout masks into the multi-criteria training framework. We demonstrate that our approach reaches state-of-the-art (SoTA) performance on F1 scores for six of the nine CWS benchmark datasets and out-of-vocabulary (OOV) recalls for eight of nine. Further experiments discover that substantial improvements can be brought with various sentence representation objectives.

pdf abs
IKM_Lab at BioLaySumm Task 1: Longformer-based Prompt Tuning for Biomedical Lay Summary Generation
Yu-Hsuan Wu | Ying-Jia Lin | Hung-Yu Kao
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

This paper describes the entry by the Intelligent Knowledge Management (IKM) Laboratory in the BioLaySumm 2023 task1. We aim to transform lengthy biomedical articles into concise, reader-friendly summaries that can be easily comprehended by the general public. We utilized a long-text abstractive summarization longformer model and experimented with several prompt methods for this task. Our entry placed 10th overall, but we were particularly proud to achieve a 3rd place score in the readability evaluation metric.

pdf abs
Improved Unsupervised Chinese Word Segmentation Using Pre-trained Knowledge and Pseudo-labeling Transfer
Hsiu-Wen Li | Ying-Jia Lin | Yi-Ting Li | Chun Lin | Hung-Yu Kao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Unsupervised Chinese word segmentation (UCWS) has made progress by incorporating linguistic knowledge from pre-trained language models using parameter-free probing techniques. However, such approaches suffer from increased training time due to the need for multiple inferences using a pre-trained language model to perform word segmentation. This work introduces a novel way to enhance UCWS performance while maintaining training efficiency. Our proposed method integrates the segmentation signal from the unsupervised segmental language model to the pre-trained BERT classifier under a pseudo-labeling framework. Experimental results demonstrate that our approach achieves state-of-the-art performance on the eight UCWS tasks while considerably reducing the training time compared to previous approaches.

2022

pdf bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 27, Number 2, December 2022
Berlin Chen | Hung-Yu Kao
International Journal of Computational Linguistics & Chinese Language Processing, Volume 27, Number 2, December 2022

pdf abs
Unsupervised Single Document Abstractive Summarization using Semantic Units
Jhen-Yi Wu | Ying-Jia Lin | Hung-Yu Kao
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work, we study the importance of content frequency on abstractive summarization, where we define the content as “semantic units.” We propose a two-stage training framework to let the model automatically learn the frequency of each semantic unit in the source text. Our model is trained in an unsupervised manner since the frequency information can be inferred from source text only. During inference, our model identifies sentences with high-frequency semantic units and utilizes frequency information to generate summaries from the filtered sentences. Our model performance on the CNN/Daily Mail summarization task outperforms the other unsupervised methods under the same settings. Furthermore, we achieve competitive ROUGE scores with far fewer model parameters compared to several large-scale pre-trained models. Our model can be trained under low-resource language settings and thus can serve as a potential solution for real-world applications where pre-trained models are not applicable.

pdf abs
R-AT: Regularized Adversarial Training for Natural Language Understanding
Shiwen Ni | Jiawen Li | Hung-Yu Kao
Findings of the Association for Computational Linguistics: EMNLP 2022

Currently, adversarial training has become a popular and powerful regularization method in the natural language domain. In this paper, we Regularized Adversarial Training (R-AT) via dropout, which forces the output probability distributions of different sub-models generated by dropout to be consistent under the same adversarial samples. Specifically, we generate adversarial samples by perturbing the word embeddings. For each adversarial sample fed to the model, R-AT minimizes both the adversarial risk and the bidirectional KL-divergence between the adversarial output distributions of two sub-models sampled by dropout. Through extensive experiments on 13 public natural language understanding datasets, we found that R-AT has improvements for many models (e.g., rnn-based, cnn-based, and transformer-based models). For the GLUE benchmark, when R-AT is only applied to the fine-tuning stage, it is able to improve the overall test score of the BERT-base model from 78.3 to 79.6 and the RoBERTa-large model from 88.1 to 88.6. Theoretical analysis reveals that R-AT has potential gradient regularization during the training process. Furthermore, R-AT can reduce the inconsistency between training and testing of models with dropout.

2021

pdf abs
Unsupervised Extractive Summarization-Based Representations for Accurate and Explainable Collaborative Filtering
Reinald Adrian Pugoy | Hung-Yu Kao
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We pioneer the first extractive summarization-based collaborative filtering model called ESCOFILT. Our proposed model specifically produces extractive summaries for each item and user. Unlike other types of explanations, summary-level explanations closely resemble real-life explanations. The strength of ESCOFILT lies in the fact that it unifies representation and explanation. In other words, extractive summaries both represent and explain the items and users. Our model uniquely integrates BERT, K-Means embedding clustering, and multilayer perceptron to learn sentence embeddings, representation-explanations, and user-item interactions, respectively. We argue that our approach enhances both rating prediction accuracy and user/item explainability. Our experiments illustrate that ESCOFILT’s prediction accuracy is better than the other state-of-the-art recommender models. Furthermore, we propose a comprehensive set of criteria that assesses the real-life explainability of explanations. Our explainability study demonstrates the superiority of and preference for summary-level explanations over other explanation types.

pdf
Meet The Truth: Leverage Objective Facts and Subjective Views for Interpretable Rumor Detection
Jiawen Li | Shiwen Ni | Hung-Yu Kao
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021
Berlin Chen | Hung-Yu Kao
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021

2020

pdf bib abs
Measuring Alignment to Authoritarian State Media as Framing Bias
Timothy Niven | Hung-Yu Kao
Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

We introduce what is to the best of our knowledge a new task in natural language processing: measuring alignment to authoritarian state media. We operationalize alignment in terms of sociological definitions of media bias. We take as a case study the alignment of four Taiwanese media outlets to the Chinese Communist Party state media. We present the results of an initial investigation using the frequency of words in psychologically meaningful categories. Our findings suggest that the chosen word categories correlate with framing choices. We develop a calculation method that yields reasonable results for measuring alignment, agreeing well with the known labels. We confirm that our method does capture event selection bias, but whether it captures framing bias requires further investigation.

pdf abs
Rumor Detection on Twitter Using Multiloss Hierarchical BiLSTM with an Attenuation Factor
Yudianto Sujana | Jiawen Li | Hung-Yu Kao
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Social media platforms such as Twitter have become a breeding ground for unverified information or rumors. These rumors can threaten people’s health, endanger the economy, and affect the stability of a country. Many researchers have developed models to classify rumors using traditional machine learning or vanilla deep learning models. However, previous studies on rumor detection have achieved low precision and are time consuming. Inspired by the hierarchical model and multitask learning, a multiloss hierarchical BiLSTM model with an attenuation factor is proposed in this paper. The model is divided into two BiLSTM modules: post level and event level. By means of this hierarchical structure, the model can extract deep information from limited quantities of text. Each module has a loss function that helps to learn bilateral features and reduce the training time. An attenuation factor is added at the post level to increase the accuracy. The results on two rumor datasets demonstrate that our model achieves better performance than that of state-of-the-art machine learning and vanilla deep learning models.

pdf abs
BERT-Based Neural Collaborative Filtering and Fixed-Length Contiguous Tokens Explanation
Reinald Adrian Pugoy | Hung-Yu Kao
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

We propose a novel, accurate, and explainable recommender model (BENEFICT) that addresses two drawbacks that most review-based recommender systems face. First is their utilization of traditional word embeddings that could influence prediction performance due to their inability to model the word semantics’ dynamic characteristic. Second is their black-box nature that makes the explanations behind every prediction obscure. Our model uniquely integrates three key elements: BERT, multilayer perceptron, and maximum subarray problem to derive contextualized review features, model user-item interactions, and generate explanations, respectively. Our experiments show that BENEFICT consistently outperforms other state-of-the-art models by an average improvement gain of nearly 7%. Based on the human judges’ assessment, the BENEFICT-produced explanations can capture the essence of the customer’s preference and help future customers make purchasing decisions. To the best of our knowledge, our model is one of the first recommender models to utilize BERT for neural collaborative filtering.

pdf abs
Exploiting Microblog Conversation Structures to Detect Rumors
Jiawen Li | Yudianto Sujana | Hung-Yu Kao
Proceedings of the 28th International Conference on Computational Linguistics

As one of the most popular social media platforms, Twitter has become a primary source of information for many people. Unfortunately, both valid information and rumors are propagated on Twitter due to the lack of an automatic information verification system. Twitter users communicate by replying to other users’ messages, forming a conversation structure. Using this structure, users can decide whether the information in the source tweet is a rumor by reading the tweet’s replies, which voice other users’ stances on the tweet. The majority of rumor detection researchers process such tweets based on time, ignoring the conversation structure. To reap the benefits of the Twitter conversation structure, we developed a model to detect rumors by modeling conversation structure as a graph. Thus, our model’s improved representation of the conversation structure enhances its rumor detection accuracy. The experimental results on two rumor datasets show that our model outperforms several baseline models, including a state-of-the-art model

2019

pdf
基於有向圖與爭論導向摘要的網路辯論之爭論元素辨識(Identifying Argument Components in Online Debates through Directed Graph and Argument-oriented Summarization)
Chi-An Wei | Hung-Yu Kao
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

pdf abs
Fill the GAP: Exploiting BERT for Pronoun Resolution
Kai-Chou Yang | Timothy Niven | Tzu Hsuan Chou | Hung-Yu Kao
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

In this paper, we describe our entry in the gendered pronoun resolution competition which achieved fourth place without data augmentation. Our method is an ensemble system of BERTs which resolves co-reference in an interaction space. We report four insights from our work: BERT’s representations involve significant redundancy; modeling interaction effects similar to natural language inference models is useful for this task; there is an optimal BERT layer to extract representations for pronoun resolution; and the difference between the attention weights from the pronoun to the candidate entities was highly correlated with the correct label, with interesting implications for future work.

pdf abs
Detecting Argumentative Discourse Acts with Linguistic Alignment
Timothy Niven | Hung-Yu Kao
Proceedings of the 6th Workshop on Argument Mining

We report the results of preliminary investigations into the relationship between linguistic alignment and dialogical argumentation at the level of discourse acts. We annotated a proof of concept dataset with illocutions and transitions at the comment level based on Inference Anchoring Theory. We estimated linguistic alignment across discourse acts and found significant variation. Alignment features calculated at the dyad level are found to be useful for detecting a range of argumentative discourse acts.

pdf abs
Probing Neural Network Comprehension of Natural Language Arguments
Timothy Niven | Hung-Yu Kao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We are surprised to find that BERT’s peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them. This analysis informs the construction of an adversarial dataset on which all models achieve random accuracy. Our adversarial dataset provides a more robust assessment of argument comprehension and should be adopted as the standard in future work.

2018

pdf abs
NLITrans at SemEval-2018 Task 12: Transfer of Semantic Knowledge for Argument Comprehension
Timothy Niven | Hung-Yu Kao
Proceedings of the 12th International Workshop on Semantic Evaluation

The Argument Reasoning Comprehension Task is a difficult challenge requiring significant language understanding and complex reasoning over world knowledge. We focus on transfer of a sentence encoder to bootstrap more complicated architectures given the small size of the dataset. Our best model uses a pre-trained BiLSTM to encode input sentences, learns task-specific features for the argument and warrants, then performs independent argument-warrant matching. This model achieves mean test set accuracy of 61.31%. Encoder transfer yields a significant gain to our best model over random initialization. Sharing parameters for independent warrant evaluation provides regularization and effectively doubles the size of the dataset. We demonstrate that regularization comes from ignoring statistical correlations between warrant positions. We also report an experiment with our best model that only matches warrants to reasons, ignoring claims. Performance is still competitive, suggesting that our model is not necessarily learning the intended task.

2017

pdf abs
IKM at SemEval-2017 Task 8: Convolutional Neural Networks for stance detection and rumor verification
Yi-Chin Chen | Zhao-Yang Liu | Hung-Yu Kao
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our approach for SemEval-2017 Task 8. We aim at detecting the stance of tweets and determining the veracity of the given rumor. We utilize a convolutional neural network for short text categorization using multiple filter sizes. Our approach beats the baseline classifiers on different event data with good F1 scores. The best of our submitted runs achieves rank 1st among all scores on subtask B.

Co-authors

Venues