This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
HiroakiYamada
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Intermediate Layer Distillation (ILD) is a variant of Knowledge Distillation (KD), a method for compressing neural networks.ILD requires mapping to align the intermediate layer sizes of the teacher and student models to compute the loss function in training, while this mapping is not used during inference.This inconsistency may reduce the effectiveness of learning in intermediate layers.In this study, we propose LoRAILD, which uses LoRA adapters to eliminate the inconsistency.However, our experimental results show that LoRAILD does not outperform existing methods.Furthermore, contrary to previous studies, we observe that conventional ILD does not outperform vanilla KD.Our analysis of the distilled models’ intermediate layers suggests that ILD does not improve language models’ performance.
Evaluating an LLM’s robustness against numerical perturbation is a good way to know if the LLM actually performs reasoning or just replicates patterns learned. We propose a novel method to augment math word problems (MWPs), producing numerical variations at a large scale utilizing templates. We also propose an automated error classification framework for scalable error analysis, distinguishing calculation errors from reasoning errors. Our experiments using the methods show LLMs are weak against numerical variations, suggesting they are not fully capable of generating valid reasoning steps, often failing in arithmetic operations.
Antonymy has long received particular attention in lexical semantics.Previous studies have shown that antonym pairs frequently co-occur in text, across genres and parts of speech, more often than would be expected by chance. However, whether this co-occurrence pattern is distinctive of antonymy remains unclear, due to a lack of comparison with other semantic relations. This work fills the gap by comparing antonymy with three other relations across parts of speech using robust co-occurrence metrics. We find that antonymy is distinctive in three respects: antonym pairs co-occur with high strength, in a preferred linear order, and within short spans. All results are available online.
Interpretation methods provide saliency scores indicating the importance of input words for neural summarization models. Prior work has analyzed models by comparing them to human behavior, often using eye-gaze as a proxy for human attention in reading tasks such as classification. This paper presents a framework to analyze the model behavior in summarization by comparing it to human summarization behavior using eye-gaze data. We examine two research questions: RQ1) whether model saliency conforms to human gaze during summarization and RQ2) how model saliency and human gaze affect summarization performance. For RQ1, we measure conformity by calculating the correlation between model saliency and human fixation counts. For RQ2, we conduct ablation experiments removing words/sentences considered important by models or humans. Experiments on two datasets with human eye-gaze during summarization partially confirm that model saliency aligns with human gaze (RQ1). However, ablation experiments show that removing highly-attended words/sentences from the human gaze does not significantly degrade performance compared with the removal by the model saliency (RQ2).
This paper investigates the pretrained language model (PLM) specialised in the Japanese legal domain. We create PLMs using different pretraining strategies and investigate their performance across multiple domains. Our findings are (i) the PLM built with general domain data can be improved by further pretraining with domain-specific data, (ii) domain-specific PLMs can learn domain-specific and general word meanings simultaneously and can distinguish them, (iii) domain-specific PLMs work better on its target domain; still, the PLMs retain the information learnt in the original PLM even after being further pretrained with domain-specific data, (iv) the PLMs sequentially pretrained with corpora of different domains show high performance for the later learnt domains.
This paper describes a comprehensive annotation study on Japanese judgment documents in civil cases. We aim to build an annotated corpus designed for Legal Judgment Prediction (LJP), especially for torts. Our annotation scheme contains annotations of whether tort is accepted by judges as well as its corresponding rationales for explainability purpose. Our annotation scheme extracts decisions and rationales at character-level. Moreover, the scheme can capture the explicit causal relation between judge’s decisions and their corresponding rationales, allowing multiple decisions in a document. To obtain high-quality annotation, we developed an annotation scheme with legal experts, and confirmed its reliability by agreement studies with Krippendorff’s alpha metric. The result of the annotation study suggests the proposed annotation scheme can produce a dataset of Japanese LJP at reasonable reliability.
In this paper, we approach summary evaluation from an applied linguistics (AL) point of view. We provide computational tools to AL researchers to simplify the process of Idea Unit (IU) segmentation. The IU is a segmentation unit that can identify chunks of information. These chunks can be compared across documents to measure the content overlap between a summary and its source text. We propose a full revision of the annotation guidelines to allow machine implementation. The new guideline also improves the inter-annotator agreement, rising from 0.547 to 0.785 (Cohen’s Kappa). We release L2WS 2021, a IU gold standard corpus composed of 40 manually annotated student summaries. We propose IUExtract; i.e. the first automatic segmentation algorithm based on the IU. The algorithm was tested over the L2WS 2021 corpus. Our results are promising, achieving a precision of 0.789 and a recall of 0.844. We tested an existing approach to IU alignment via word embeddings with the state of the art model SBERT. The recorded precision for the top 1 aligned pair of IUs was 0.375. We deemed this result insufficient for effective automatic alignment. We propose “SAT”, an online tool to facilitate the collection of alignment gold standards for future training.
This paper discusses the computer-assisted content evaluation of summaries. We propose a method to make a correspondence between the segments of the source text and its summary. As a unit of the segment, we adopt “Idea Unit (IU)” which is proposed in Applied Linguistics. Introducing IUs enables us to make a correspondence even for the sentences that contain multiple ideas. The IU correspondence is made based on the similarity between vector representations of IU. An evaluation experiment with two source texts and 20 summaries showed that the proposed method is more robust against rephrased expressions than the conventional ROUGE-based baselines. Also, the proposed method outperformed the baselines in recall. We im-plemented the proposed method in a GUI tool“Segment Matcher” that aids teachers to estab-lish a link between corresponding IUs acrossthe summary and source text.
We propose a method for the annotation of Japanese civil judgment documents, with the purpose of creating flexible summaries of these. The first step, described in the current paper, concerns content selection, i.e., the question of which material should be extracted initially for the summary. In particular, we utilize the hierarchical argument structure of the judgment documents. Our main contributions are a) the design of an annotation scheme that stresses the connection between legal points (called issue topics) and argument structure, b) an adaptation of rhetorical status to suit the Japanese legal system and c) the definition of a linked argument structure based on legal sub-arguments. In this paper, we report agreement between two annotators on several aspects of the overall task.