2022
pdf
abs
Attention-Focused Adversarial Training for Robust Temporal Reasoning
Lis Kanashiro Pereira
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We propose an enhanced adversarial training algorithm for fine-tuning transformer-based language models (i.e., RoBERTa) and apply it to the temporal reasoning task. Current adversarial training approaches for NLP add the adversarial perturbation only to the embedding layer, ignoring the other layers of the model, which might limit the generalization power of adversarial training. Instead, our algorithm searches for the best combination of layers to add the adversarial perturbation. We add the adversarial perturbation to multiple hidden states or attention representations of the model layers. Adding the perturbation to the attention representations performed best in our experiments. Our model can improve performance on several temporal reasoning benchmarks, and establishes new state-of-the-art results.
pdf
abs
Toward Building a Language Model for Understanding Temporal Commonsense
Mayuko Kimura
|
Lis Kanashiro Pereira
|
Ichiro Kobayashi
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
The ability to capture temporal commonsense relationships for time-related events expressed in text is a very important task in natural language understanding. On the other hand, pre-trained language models such as BERT, which have recently achieved great success in a wide range of natural language processing tasks, are still considered to have poor performance in temporal reasoning. In this paper, we focus on the development of language models for temporal commonsense inference over several pre-trained language models. Our model relies on multi-step fine-tuning using multiple corpora, and masked language modeling to predict masked temporal indicators that are crucial for temporal commonsense reasoning. We also experimented with multi-task learning and build a language model that can improve performance on multiple time-related tasks. In our experiments, multi-step fine-tuning using the general commonsense reading task as auxiliary task produced the best results. This result showed a significant improvement in accuracy over standard fine-tuning in the temporal commonsense inference task.
2021
pdf
abs
Multi-Layer Random Perturbation Training for improving Model Generalization Efficiently
Lis Kanashiro Pereira
|
Yuki Taya
|
Ichiro Kobayashi
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
We propose a simple yet effective Multi-Layer RAndom Perturbation Training algorithm (RAPT) to enhance model robustness and generalization. The key idea is to apply randomly sampled noise to each input to generate label-preserving artificial input points. To encourage the model to generate more diverse examples, the noise is added to a combination of the model layers. Then, our model regularizes the posterior difference between clean and noisy inputs. We apply RAPT towards robust and efficient BERT training, and conduct comprehensive fine-tuning experiments on GLUE tasks. Our results show that RAPT outperforms the standard fine-tuning approach, and adversarial training method, yet with 22% less training time.
pdf
bib
abs
OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction
Yuki Taya
|
Lis Kanashiro Pereira
|
Fei Cheng
|
Ichiro Kobayashi
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
We propose an ensemble model for predicting the lexical complexity of words and multiword expressions (MWEs). The model receives as input a sentence with a target word or MWE and outputs its complexity score. Given that a key challenge with this task is the limited size of annotated data, our model relies on pretrained contextual representations from different state-of-the-art transformer-based language models (i.e., BERT and RoBERTa), and on a variety of training methods for further enhancing model generalization and robustness: multi-step fine-tuning and multi-task learning, and adversarial training. Additionally, we propose to enrich contextual representations by adding hand-crafted features during training. Our model achieved competitive results and ranked among the top-10 systems in both sub-tasks.
pdf
abs
Towards a Language Model for Temporal Commonsense Reasoning
Mayuko Kimura
|
Lis Kanashiro Pereira
|
Ichiro Kobayashi
Proceedings of the Student Research Workshop Associated with RANLP 2021
Temporal commonsense reasoning is a challenging task as it requires temporal knowledge usually not explicit in text. In this work, we propose an ensemble model for temporal commonsense reasoning. Our model relies on pre-trained contextual representations from transformer-based language models (i.e., BERT), and on a variety of training methods for enhancing model generalization: 1) multi-step fine-tuning using carefully selected auxiliary tasks and datasets, and 2) a specifically designed temporal masked language model task aimed to capture temporal commonsense knowledge. Our model greatly outperforms the standard fine-tuning approach and strong baselines on the MC-TACO dataset.
pdf
Dependency Enhanced Contextual Representations for Japanese Temporal Relation Classification
Chenjing Geng
|
Fei Cheng
|
Masayuki Asahara
|
Lis Kanashiro Pereira
|
Ichiro Kobayashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
2020
pdf
bib
abs
Dialogue over Context and Structured Knowledge using a Neural Network Model with External Memories
Yuri Murayama
|
Lis Kanashiro Pereira
|
Ichiro Kobayashi
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP
The Differentiable Neural Computer (DNC), a neural network model with an addressable external memory, can solve algorithmic and question answering tasks. There are various improved versions of DNC, such as rsDNC and DNC-DMS. However, how to integrate structured knowledge into these DNC models remains a challenging research question. We incorporate an architecture for knowledge into such DNC models, i.e. DNC, rsDNC and DNC-DMS, to improve the ability to generate correct responses using both contextual information and structured knowledge. Our improved rsDNC model improves the mean accuracy by approximately 20% to the original rsDNC on tasks requiring knowledge in the dialog bAbI tasks. In addition, our improved rsDNC and DNC-DMS models also yield better performance than their original models in the Movie Dialog dataset.