Sho Takase

2022

pdf abs
ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
Mengsay Loem | Sho Takase | Masahiro Kaneko | Naoaki Okazaki
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

Neural models trained with large amount of parallel data have achieved impressive performance in abstractive summarization tasks. However, large-scale parallel corpora are expensive and challenging to construct. In this work, we introduce a low-cost and effective strategy, ExtraPhrase, to augment training data for abstractive summarization tasks. ExtraPhrase constructs pseudo training data in two steps: extractive summarization and paraphrasing. We extract major parts of an input text in the extractive summarization step and obtain its diverse expressions with the paraphrasing step. Through experiments, we show that ExtraPhrase improves the performance of abstractive summarization tasks by more than 0.50 points in ROUGE scores compared to the setting without data augmentation. ExtraPhrase also outperforms existing methods such as back-translation and self-training. We also show that ExtraPhrase is significantly effective when the amount of genuine training data is remarkably small, i.e., a low-resource setting. Moreover, ExtraPhrase is more cost-efficient than the existing approaches

pdf abs
Multi-Task Learning for Cross-Lingual Abstractive Summarization
Sho Takase | Naoaki Okazaki
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present a multi-task learning framework for cross-lingual abstractive summarization to augment training data. Recent studies constructed pseudo cross-lingual abstractive summarization data to train their neural encoder-decoders. Meanwhile, we introduce existing genuine data such as translation pairs and monolingual abstractive summarization data into training. Our proposed method, Transum, attaches a special token to the beginning of the input sentence to indicate the target task. The special token enables us to incorporate the genuine data into the training data easily. The experimental results show that Transum achieves better performance than the model trained with only pseudo cross-lingual summarization data. In addition, we achieve the top ROUGE score on Chinese-English and Arabic-English abstractive summarization. Moreover, Transum also has a positive effect on machine translation. Experimental results indicate that Transum improves the performance from the strong baseline, Transformer, in Chinese-English, Arabic-English, and English-Japanese translation datasets.

pdf abs
Interpretability for Language Learners Using Example-Based Grammatical Error Correction
Masahiro Kaneko | Sho Takase | Ayana Niwa | Naoaki Okazaki
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Grammatical Error Correction (GEC) should not focus only on high accuracy of corrections but also on interpretability for language learning.However, existing neural-based GEC models mainly aim at improving accuracy, and their interpretability has not been explored.A promising approach for improving interpretability is an example-based method, which uses similar retrieved examples to generate corrections. In addition, examples are beneficial in language learning, helping learners understand the basis of grammatically incorrect/correct texts and improve their confidence in writing.Therefore, we hypothesize that incorporating an example-based method into GEC can improve interpretability as well as support language learners.In this study, we introduce an Example-Based GEC (EB-GEC) that presents examples to language learners as a basis for a correction result.The examples consist of pairs of correct and incorrect sentences similar to a given input and its predicted correction.Experiments demonstrate that the examples presented by EB-GEC help language learners decide to accept or refuse suggestions from the GEC output.Furthermore, the experiments also show that retrieved examples improve the accuracy of corrections.

pdf abs
Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Sho Takase | Tatsuya Hiraoka | Naoaki Okazaki
Findings of the Association for Computational Linguistics: ACL 2022

Subword regularizations use multiple subword segmentations during training to improve the robustness of neural machine translation models.In previous subword regularizations, we use multiple segmentations in the training process but use only one segmentation in the inference.In this study, we propose an inference strategy to address this discrepancy.The proposed strategy approximates the marginalized likelihood by using multiple segmentations including the most plausible segmentation and several sampled segmentations.Because the proposed strategy aggregates predictions from several segmentations, we can regard it as a single model ensemble that does not require any additional cost for training.Experimental results show that the proposed strategy improves the performance of models trained with subword regularization in low-resource machine translation tasks.

pdf abs
Word-level Perturbation Considering Word Length and Compositional Subwords
Tatsuya Hiraoka | Sho Takase | Kei Uchiumi | Atsushi Keyaki | Naoaki Okazaki
Findings of the Association for Computational Linguistics: ACL 2022

We present two simple modifications for word-level perturbation: Word Replacement considering Length (WR-L) and Compositional Word Replacement (CWR).In conventional word replacement, a word in an input is replaced with a word sampled from the entire vocabulary, regardless of the length and context of the target word.WR-L considers the length of a target word by sampling words from the Poisson distribution.CWR considers the compositional candidates by restricting the source of sampling to related words that appear in subword regularization.Experimental results showed that the combination of WR-L and CWR improved the performance of text classification and machine translation.

This paper describes the NTT-Tohoku-TokyoTech-RIKEN (NT5) team’s submission system for the WMT’22 general translation task.This year, we focused on the English-to-Japanese and Japanese-to-English translation tracks.Our submission system consists of an ensemble of Transformer models with several extensions.We also applied data augmentation and selection techniques to obtain potentially effective training data for training individual Transformer models in the pre-training and fine-tuning scheme.Additionally, we report our trial of incorporating a reranking module and the reevaluated results of several techniques that have been recently developed and published.

2021

pdf
Joint Optimization of Tokenization and Downstream Model
Tatsuya Hiraoka | Sho Takase | Kei Uchiumi | Atsushi Keyaki | Naoaki Okazaki
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf abs
Rethinking Perturbations in Encoder-Decoders for Fast Training
Sho Takase | Shun Kiyono
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We often use perturbations to regularize neural models. For neural encoder-decoders, previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial perturbations (Sato et al., 2019) as perturbations but these methods require considerable computational time. Thus, this study addresses the question of whether these approaches are efficient enough for training time. We compare several perturbations in sequence-to-sequence problems with respect to computational time. Experimental results show that the simple techniques such as word dropout (Gal and Ghahramani, 2016) and random replacement of input tokens achieve comparable (or better) scores to the recently proposed perturbations, even though these simple methods are faster.

2020

pdf abs
Optimizing Word Segmentation for Downstream Task
Tatsuya Hiraoka | Sho Takase | Kei Uchiumi | Atsushi Keyaki | Naoaki Okazaki
Findings of the Association for Computational Linguistics: EMNLP 2020

In traditional NLP, we tokenize a given sentence as a preprocessing, and thus the tokenization is unrelated to a target downstream task. To address this issue, we propose a novel method to explore a tokenization which is appropriate for the downstream task. Our proposed method, optimizing tokenization (OpTok), is trained to assign a high probability to such appropriate tokenization based on the downstream task loss. OpTok can be used for any downstream task which uses a vector representation of a sentence such as text classification. Experimental results demonstrate that OpTok improves the performance of sentiment analysis and textual entailment. In addition, we introduce OpTok into BERT, the state-of-the-art contextualized embeddings and report a positive effect.

pdf abs
Improving Truthfulness of Headline Generation
Kazuki Matsumaru | Sho Takase | Naoaki Okazaki
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most studies on abstractive summarization report ROUGE scores between system and reference summaries. However, we have a concern about the truthfulness of generated summaries: whether all facts of a generated summary are mentioned in the source text. This paper explores improving the truthfulness in headline generation on two popular datasets. Analyzing headlines generated by the state-of-the-art encoder-decoder model, we show that the model sometimes generates untruthful headlines. We conjecture that one of the reasons lies in untruthful supervision data used for training the model. In order to quantify the truthfulness of article-headline pairs, we consider the textual entailment of whether an article entails its headline. After confirming quite a few untruthful instances in the datasets, this study hypothesizes that removing untruthful instances from the supervision data may remedy the problem of the untruthful behaviors of the model. Building a binary classifier that predicts an entailment relation between an article and its headline, we filter out untruthful instances from the supervision data. Experimental results demonstrate that the headline generation model trained on filtered supervision data shows no clear difference in ROUGE scores but remarkable improvements in automatic and manual evaluations of the generated headlines.

pdf abs
Evaluation Dataset for Zero Pronoun in Japanese to English Translation
Sho Shimazu | Sho Takase | Toshiaki Nakazawa | Naoaki Okazaki
Proceedings of the Twelfth Language Resources and Evaluation Conference

In natural language, we often omit some words that are easily understandable from the context. In particular, pronouns of subject, object, and possessive cases are often omitted in Japanese; these are known as zero pronouns. In translation from Japanese to other languages, we need to find a correct antecedent for each zero pronoun to generate a correct and coherent translation. However, it is difficult for conventional automatic evaluation metrics (e.g., BLEU) to focus on the success of zero pronoun resolution. Therefore, we present a hand-crafted dataset to evaluate whether translation models can resolve the zero pronoun problems in Japanese to English translations. We manually and statistically validate that our dataset can effectively evaluate the correctness of the antecedents selected in translations. Through the translation experiments using our dataset, we reveal shortcomings of an existing context-aware neural machine translation model.

2019

pdf abs
Positional Encoding to Control Output Sequence Length
Sho Takase | Naoaki Okazaki
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural encoder-decoder models have been successful in natural language generation tasks. However, real applications of abstractive summarization must consider an additional constraint that a generated summary should not exceed a desired length. In this paper, we propose a simple but effective extension of a sinusoidal positional encoding (Vaswani et al., 2017) so that a neural encoder-decoder model preserves the length constraint. Unlike previous studies that learn length embeddings, the proposed method can generate a text of any length even if the target length is unseen in training data. The experimental results show that the proposed method is able not only to control generation length but also improve ROUGE scores.

pdf abs
Neural Question Generation using Interrogative Phrases
Yuichi Sasazawa | Sho Takase | Naoaki Okazaki
Proceedings of the 12th International Conference on Natural Language Generation

Question Generation (QG) is the task of generating questions from a given passage. One of the key requirements of QG is to generate a question such that it results in a target answer. Previous works used a target answer to obtain a desired question. However, we also want to specify how to ask questions and improve the quality of generated questions. In this study, we explore the use of interrogative phrases as additional sources to control QG. By providing interrogative phrases, we expect that QG can generate a more reliable sequence of words subsequent to an interrogative phrase. We present a baseline sequence-to-sequence model with the attention, copy, and coverage mechanisms, and show that the simple baseline achieves state-of-the-art performance. The experiments demonstrate that interrogative phrases contribute to improving the performance of QG. In addition, we report the superiority of using interrogative phrases in human evaluation. Finally, we show that a question answering system can provide target answers more correctly when the questions are generated with interrogative phrases.

pdf abs
Generating Natural Anagrams: Towards Language Generation Under Hard Combinatorial Constraints
Masaaki Nishino | Sho Takase | Tsutomu Hirao | Masaaki Nagata
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

An anagram is a sentence or a phrase that is made by permutating the characters of an input sentence or a phrase. For example, “Trims cash” is an anagram of “Christmas”. Existing automatic anagram generation methods can find possible combinations of words form an anagram. However, they do not pay much attention to the naturalness of the generated anagrams. In this paper, we show that simple depth-first search can yield natural anagrams when it is combined with modern neural language models. Human evaluation results show that the proposed method can generate significantly more natural anagrams than baseline methods.

2018

pdf abs
Direct Output Connection for a High-Rank Language Model
Sho Takase | Jun Suzuki | Masaaki Nagata
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also middle layers. This method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). Our proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to application tasks: machine translation and headline generation.

pdf abs
Unsupervised Token-wise Alignment to Improve Interpretation of Encoder-Decoder Models
Shun Kiyono | Sho Takase | Jun Suzuki | Naoaki Okazaki | Kentaro Inui | Masaaki Nagata
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Developing a method for understanding the inner workings of black-box neural methods is an important research endeavor. Conventionally, many studies have used an attention matrix to interpret how Encoder-Decoder-based models translate a given source sentence to the corresponding target sentence. However, recent studies have empirically revealed that an attention matrix is not optimal for token-wise translation analyses. We propose a method that explicitly models the token-wise alignment between the source and target sequences to provide a better analysis. Experiments show that our method can acquire token-wise alignments that are superior to those of an attention mechanism.

pdf abs
An Empirical Study of Building a Strong Baseline for Constituency Parsing
Jun Suzuki | Sho Takase | Hidetaka Kamigaito | Makoto Morishita | Masaaki Nagata
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing. We incorporate several techniques that were mainly developed in natural language generation tasks, e.g., machine translation and summarization, and demonstrate that the sequence-to-sequence model achieves the current top-notch parsers’ performance (almost) without requiring any explicit task-specific knowledge or architecture of constituent parsing.

2017

pdf abs
Input-to-Output Gate to Improve RNN Language Models
Sho Takase | Jun Suzuki | Masaaki Nagata
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper proposes a reinforcing method that refines the output layers of existing Recurrent Neural Network (RNN) language models. We refer to our proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple structure, and thus, can be easily combined with any RNN language models. Our experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG consistently boosts the performance of several different types of current topline RNN language models.

pdf
Handling Multiword Expressions in Causality Estimation
Shota Sasaki | Sho Takase | Naoya Inoue | Naoaki Okazaki | Kentaro Inui
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers