This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
DahyunJung
Also published as:
DaHyun Jung
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Recent advancements in Large Language Models (LLMs) have significantly enhanced interactions between users and models. These advancements concurrently underscore the need for rigorous safety evaluations due to the manifestation of social biases, which can lead to harmful societal impacts. Despite these concerns, existing benchmarks may overlook the intrinsic weaknesses of LLMs, which can generate biased responses even with simple adversarial instructions. To address this critical gap, we introduce a new benchmark, Fairness Benchmark in LLM under Extreme Scenarios (FLEX), designed to test whether LLMs can sustain fairness even when exposed to prompts constructed to induce bias. To thoroughly evaluate the robustness of LLMs, we integrate prompts that amplify potential biases into the fairness assessment. Comparative experiments between FLEX and existing benchmarks demonstrate that traditional evaluations may underestimate the inherent risks in models. This highlights the need for more stringent LLM evaluation benchmarks to guarantee safety and fairness.
Knowledge editing aims to update Large Language Models (LLMs) with new information without costly retraining. However, consistently reflecting these updates in complex multi-hop Question Answering (QA), which demands reasoning over interconnected facts, is challenging. Many existing methods overlook the interplay with pre-existing knowledge, leading to inconsistent edit propagation. To overcome this, we introduce StepKE (Stepwise Knowledge Editing for Multi-hop QA), a novel framework for robustly integrating edited and existing knowledge for coherent multi-hop reasoning. StepKE uniquely decomposes multi-hop questions into sequential single-hop sub-questions, retrieving relevant facts (both edited and pre-existing) from an external knowledge graph for each step. It employs context-aware prompting with prior reasoning history and fine-tuning for precise edit propagation. This systematic integration enables effective stepwise reasoning. Experiments show StepKE generates significantly more accurate and consistent responses than baselines, showcasing strong knowledge editing and integration in multi-hop QA.
Korean legal knowledge is subject to frequent temporal updates driven by societal needs and government policies. Even minor modifications to legal provisions can have significant consequences, yet continuously retraining large language models (LLMs) to incorporate such updates is resource-intensive and impractical. To address this, we propose KoLEG, an on-the-fly Korean Legal knowledge editing framework enhanced with continuous retrieval. KoLEG employs an Editing-Aware Learning Strategy and a LawEdit Retriever, which together adaptively integrate subtle linguistic nuances and continuous legislative amendments. To support this task, we construct the Korean Legislative Amendment Dataset, explicitly designed for continuous legal knowledge updates with attention to both temporal dynamics and linguistic subtleties. KoLEG outperforms existing locate-then-edit and retrieval-based editing methods, demonstrating superior effectiveness in legal knowledge editing while preserving linguistic capabilities. Furthermore, KoLEG maintains robust performance in sequential editing, improves performance on precedent application tasks, and is qualitatively validated by legal experts.
Large language models (LLMs) often retain outdated or incorrect information from pre-training, which undermines their reliability. While model editing methods have been developed to address such errors without full re-training, they frequently suffer from knowledge conflicts, where outdated information interferes with new knowledge. In this work, we propose Conflict-free Model Editing (CoME), a novel framework that enhances the accuracy of knowledge updates in LLMs by selectively removing outdated knowledge. CoME leverages unlearning to mitigate knowledge interference, allowing new information to be integrated without compromising relevant linguistic features. Through experiments on GPT-J and LLaMA-3 using Counterfact and ZsRE datasets, we demonstrate that CoME improves both editing accuracy and model reliability when applied to existing editing methods. Our results highlight that the targeted removal of outdated knowledge is crucial for enhancing model editing effectiveness and maintaining the model’s generative performance.
The advent of large language models has experienced a remarkable improvement in the field of machine translation. However, machine translation is still vulnerable to critical meaning deviations, which may incur catastrophic issues in social or ethical contexts. In particular, existing critical error detection primarily focuses on identifying sentence-level errors, leaving the precise localization of such errors within the sentence unaddressed. In this paper, we introduce a new task, word-level critical error detection (WCED), to detect critical errors at a fine-grained level in machine translation sentences. The task aims to identify the parts of a machine translation that contain catastrophic meaning distortions. We hypothesize that the ability to determine errors at the sentence level will positively influence the detection of more granular errors. We propose a sentence-level error detection module to predict which words in a sentence have critical errors. Experimental results demonstrate that our method outperforms existing methodologies and LLM in En-De, Zh-En, En-Ru, and En-Ko. Our method is helpful for determining the fine-grained location of errors. We hope that such studies will improve the capacity to address critical errors adeptly.
Recent machine translation (MT) systems have overcome language barriers for a wide range of users, yet they still carry the risk of critical meaning deviation. Critical error detection (CED) is a task that identifies an inherent risk of catastrophic meaning distortions in the machine translation output. With the importance of reflecting cultural elements in detecting critical errors, we introduce the culture-aware “Politeness” type in detecting English-Korean critical translation errors. Besides, we facilitate two tasks by providing multiclass labels: critical error detection and critical error type classification (CETC). Empirical evaluations reveal that our introduced data augmentation approach using a newly presented perturber significantly outperforms existing baselines in both tasks. Further analysis highlights the significance of multiclass labeling by demonstrating its superior effectiveness compared to binary labels.
Counter-narrative generation, i.e., the generation of fact-based responses to hate speech with the aim of correcting discriminatory beliefs, has been demonstrated to be an effective method to combat hate speech. However, its effectiveness is limited by the resource-intensive nature of dataset construction processes and only focuses on the primary language. To alleviate this problem, we propose a Korean Hate Speech Counter Punch (KHSCP), a cost-effective counter-narrative generation method in the Korean language. To this end, we release the first counter-narrative generation dataset in Korean and pose two research questions. Under the questions, we propose an effective augmentation method and investigate the reasonability of a large language model to overcome data scarcity in low-resource environments by leveraging existing resources. In this regard, we conduct several experiments to verify the effectiveness of the proposed method. Our results reveal that applying pre-existing resources can improve the generation performance by a significant margin. Through deep analysis on these experiments, this work proposes the possibility of overcoming the challenges of generating counter-narratives in low-resource environments.
Critical error detection (CED) in machine translation is a task that aims to detect errors that significantly distort the intended meaning. However, the existing study of CED lacks explainability due to the absence of content addressing the reasons for catastrophic errors. To address this limitation, we propose Explainable CED, a dataset that introduces the attributes of error explanation and correction regarding critical errors. Considering the advantage of reducing time costs and mitigating human annotation bias, we leverage a large language model in the data construction process. To improve the quality of the dataset and mitigate hallucination, we compare responses from the model and introduce an additional data filtering method through feedback scoring. The experiment demonstrates that the dataset appropriately reflects a consistent explanation and revision for errors, validating the reliability of the dataset.
Large Language Models (LLMs) have significantly impacted various fields requiring advanced linguistic understanding, yet concerns regarding their inherent biases and ethical considerations have also increased. Notably, LLMs have been critiqued for perpetuating stereotypes against diverse groups based on race, sexual orientation, and other attributes. However, most research analyzing these biases has predominantly focused on communities where English is the primary language, neglecting to consider the cultural and linguistic nuances of other societies. In this paper, we aim to explore the inherent biases and toxicity of LLMs, specifically within the social context of Korea. We devise a set of prompts that reflect major societal issues in Korea and assign varied personas to both ChatGPT and GPT-4 to assess the toxicity of the generated sentences. Our findings indicate that certain personas or prompt combinations consistently yield harmful content, highlighting the potential risks associated with specific persona-issue alignments within the Korean cultural framework. Furthermore, we discover that GPT-4 can produce more than twice the level of toxic content than ChatGPT under certain conditions.