Kotaro Funakoshi

2024

pdf abs
DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation
Aru Maekawa | Satoshi Kosugi | Kotaro Funakoshi | Manabu Okumura
Findings of the Association for Computational Linguistics: NAACL 2024

Dataset distillation aims to compress a training dataset by creating a small number of informative synthetic samples such that neural networks trained on them perform as well as those trained on the original training dataset. Current text dataset distillation methods create each synthetic sample as a sequence of word embeddings instead of a text to apply gradient-based optimization; however, such embedding-level distilled datasets cannot be used for training other models whose word embedding weights are different from the model used for distillation. To address this issue, we propose a novel text dataset distillation approach, called Distilling dataset into Language Model (DiLM), which trains a language model to generate informative synthetic training samples as text data, instead of directly optimizing synthetic samples. We evaluated DiLM on various text classification datasets and showed that distilled synthetic datasets from DiLM outperform those from current coreset selection methods. DiLM achieved remarkable generalization performance in training different types of models and in-context learning of large language models. Our code will be available at https://github.com/arumaekawa/DiLM.

Multi-modal machine translation (MMT) can reduce ambiguity and semantic distortion compared with traditional machine translation (MT) by utilizing auxiliary information such as images. However, current MMT methods face two primary challenges. The first is their underperformance compared to MT methods based on pre-trained models. The second is the inadequate exploitation and integration of the image modality within the model, primarily due to a lack of triplet training data. A mainstream approach is to introduce large amounts of parallel and monolingual data to train the text model and the visual model separately. However, incorporating extensive external data can result in data imbalance, which may introduce biases during training. Additionally, the collection and cleaning of such large datasets is labor-intensive. To overcome these challenges, we introduce a novel, low-cost, large language model-based data augmentation method called LAMBDA, which can enrich the original samples and expand the dataset without requiring external images and text. We propose a fine-grained image captioning module with a noise filter to hierarchically and accurately extract unexploited information from images. Additionally, we design two specific prompts to guide the GPT-3.5 model in generating enriched texts and the corresponding translations. The enriched samples contain diverse text and strong connections between text and images, leading to significant improvements for MMT baselines, with the highest being an increase of up to 3.83 BLEU score and 3.61 METEOR score.

pdf abs
Using Respiration for Enhancing Human-Robot Dialogue
Takao Obi | Kotaro Funakoshi
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

This paper presents the development and capabilities of a spoken dialogue robot that uses respiration to enhance human-robot dialogue. By employing a respiratory estimation technique that uses video input, the dialogue robot captures user respiratory information during dialogue. This information is then used to prevent speech collisions between the user and the robot and to present synchronized pseudo-respiration with the user, thereby enhancing the smoothness and engagement of human-robot dialogue.

pdf abs
Extreme Fine-tuning: A Novel and Fast Fine-tuning Approach for Text Classification
Boonnithi Jiaramaneepinit | Thodsaporn Chay-intr | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Although fine-tuning a pre-trained model with a conventional approach has shown to be effective in various downstream tasks, previous work has used only backpropagation to fine-tune the model, which causes a massive amount of computational resources and time. We propose Extreme Fine-Tuning (EFT), a novel approach for fine-tuning a pre-trained model effectively and efficiently. EFT uses backpropagation for a brief fine-tuning and an iterative extreme learning machine for training a classifier. We applied EFT to four text classification datasets, MELD, IEMOCAP, IMDb, and AG News, and compared its performance with state-of-the-art (SOTA) approaches. The results indicate that EFT noticeably outperformed the other approaches in training-time measurement with comparable model performance. We will release our code at https://github.com/up-33/extreme-fine-tuning.

pdf abs
myMediCon: End-to-End Burmese Automatic Speech Recognition for Medical Conversations
Hay Man Htun | Ye Kyaw Thu | Hutchatai Chanlekha | Kotaro Funakoshi | Thepchai Supnithi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

End-to-End Automatic Speech Recognition (ASR) models have significantly advanced the field of speech processing by streamlining traditionally complex ASR system pipelines, promising enhanced accuracy and efficiency. Despite these advancements, there is a notable absence of freely available medical conversation speech corpora for Burmese, which is one of the low-resource languages. Addressing this gap, we present a manually curated Burmese Medical Speech Conversations (myMediCon) corpus, encapsulating conversations among medical doctors, nurses, and patients. Utilizing the ESPnet speech processing toolkit, we explore End-to-End ASR models for the Burmese language, focus on Transformer and Recurrent Neural Network (RNN) architectures. Our corpus comprises 12 speakers, including three males and nine females, with a total speech duration of nearly 11 hours within the medical domain. To assess the ASR performance, we applied word and syllable segmentation to the text corpus. ASR models were evaluated using Character Error Rate (CER), Word Error Rate (WER), and Translation Error Rate (TER). The experimental results indicate that the RNN-based Burmese speech recognition with syllable-level segmentation achieved the best performance, yielding a CER of 9.7%. Moreover, the RNN approach significantly outperformed the Transformer model.

2023

pdf abs
Dataset Distillation with Attention Labels for Fine-tuning BERT
Aru Maekawa | Naoki Kobayashi | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Dataset distillation aims to create a small dataset of informative synthetic samples to rapidly train neural networks that retain the performance of the original dataset. In this paper, we focus on constructing distilled few-shot datasets for natural language processing (NLP) tasks to fine-tune pre-trained transformers. Specifically, we propose to introduce attention labels, which can efficiently distill the knowledge from the original dataset and transfer it to the transformer models via attention probabilities. We evaluated our dataset distillation methods in four various NLP tasks and demonstrated that it is possible to create distilled few-shot datasets with the attention labels, yielding impressive performances for fine-tuning BERT. Specifically, in AGNews, a four-class news classification task, our distilled few-shot dataset achieved up to 93.2% accuracy, which is 98.5% performance of the original dataset even with only one sample per class and only one gradient step.

pdf abs
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition
Dongyuan Li | Yusong Wang | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Multimodal emotion recognition aims to recognize emotions for each utterance from multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter- and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared with all baselines. Code is released on Github (https://anonymous.4open.science/r/MERC-7F88).

pdf abs
Generative Replay Inspired by Hippocampal Memory Indexing for Continual Language Learning
Aru Maekawa | Hidetaka Kamigaito | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Continual learning aims to accumulate knowledge to solve new tasks without catastrophic forgetting for previously learned tasks. Research on continual learning has led to the development of generative replay, which prevents catastrophic forgetting by generating pseudo-samples for previous tasks and learning them together with new tasks. Inspired by the biological brain, we propose the hippocampal memory indexing to enhance the generative replay by controlling sample generation using compressed features of previous training samples. It enables the generation of a specific training sample from previous tasks, thus improving the balance and quality of generated replay samples. Experimental results indicate that our method effectively controls the sample generation and consistently outperforms the performance of current generative replay methods.

pdf
A Follow-up Study on Evaluation Metrics Using Follow-up Utterances
Toshiki Kawamoto | Yuki Okano | Takato Yamazaki | Toshinori Sato | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf abs
Feedback comment generation using predicted grammatical terms
Kunitaka Jimichi | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges

The purpose of feedback comment generation is to provide useful feedback comments for a wide range of errors in learners’ essays from a language learning perspective. Since it is difficult to obtain appropriate comments at a practical level with rule-based or retrieval- based methods, we explore neural-based gen- erative methods with pre-trained models. We further assume the effectiveness of consider- ing grammatical terms in generating feedback comments. Specifically, this paper proposes T5-based models using predicted grammati- cal terms, submitted to FCG GenChal, and presents their results. By using correct gram- matical terms, our model could improve the BLEU score by 19.0 points, compared with the baseline T5 without grammatical terms on the development dataset. Furthermore, by using predicted grammatical terms, our model could improve the manual evaluation score by 2.33 points, compared with the baseline T5 without grammatical terms on the test dataset.

pdf abs
Coherent Story Generation with Structured Knowledge
Congda Ma | Kotaro Funakoshi | Kiyoaki Shirai | Manabu Okumura
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The emergence of pre-trained language models has taken story generation, which is the task of automatically generating a comprehensible story from limited information, to a new stage. Although generated stories from the language models are fluent and grammatically correct, the lack of coherence affects their quality. We propose a knowledge-based multi-stage model that incorporates the schema, a kind of structured knowledge, to guide coherent story generation. Our framework includes a schema acquisition module, a plot generation module, and a surface realization module. In the schema acquisition module, high-relevant structured knowledge pieces are selected as a schema. In the plot generation module, a coherent plot plan is navigated by the schema. In the surface realization module, conditioned by the generated plot, a story is generated. Evaluations show that our methods can generate more comprehensible stories than strong baselines, especially with higher global coherence and less repetition.

pdf abs
Generating Dialog Responses with Specified Grammatical Items for Second Language Learning
Yuki Okano | Kotaro Funakoshi | Ryo Nagata | Manabu Okumura
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

This paper proposes a new second language learning task of generating a response including specified grammatical items. We consider two approaches: 1) fine-tuning a pre-trained language model (DialoGPT) by reinforcement learning and 2) providing a few-shot prompt to a large language model (GPT-3). For reinforcement learning, we examine combinations of three reward functions that consider grammatical items, diversity, and fluency. Our experiments confirm that both approaches can generate responses including the specified grammatical items and that it is crucial to consider fluency rather than diversity as the reward function.

2022

pdf abs
Generating Repetitions with Appropriate Repeated Words
Toshiki Kawamoto | Hidetaka Kamigaito | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A repetition is a response that repeats words in the previous speaker’s utterance in a dialogue. Repetitions are essential in communication to build trust with others, as investigated in linguistic studies. In this work, we focus on repetition generation. To the best of our knowledge, this is the first neural approach to address repetition generation. We propose Weighted Label Smoothing, a smoothing method for explicitly learning which words to repeat during fine-tuning, and a repetition scoring method that can output more appropriate repetitions during decoding. We conducted automatic and human evaluations involving applying these methods to the pre-trained language model T5 for generating repetitions. The experimental results indicate that our methods outperformed baselines in both evaluations.

pdf abs
Joint Learning-based Heterogeneous Graph Attention Network for Timeline Summarization
Jingyi You | Dongyuan Li | Hidetaka Kamigaito | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Previous studies on the timeline summarization (TLS) task ignored the information interaction between sentences and dates, and adopted pre-defined unlearnable representations for them. They also considered date selection and event detection as two independent tasks, which makes it impossible to integrate their advantages and obtain a globally optimal summary. In this paper, we present a joint learning-based heterogeneous graph attention network for TLS (HeterTls), in which date selection and event detection are combined into a unified framework to improve the extraction accuracy and remove redundant sentences simultaneously. Our heterogeneous graph involves multiple types of nodes, the representations of which are iteratively learned across the heterogeneous graph attention layer. We evaluated our model on four datasets, and found that it significantly outperformed the current state-of-the-art baselines with regard to ROUGE scores and date selection metrics.

pdf abs
A-TIP: Attribute-aware Text Infilling via Pre-trained Language Model
Dongyuan Li | Jingyi You | Kotaro Funakoshi | Manabu Okumura
Proceedings of the 29th International Conference on Computational Linguistics

Text infilling aims to restore incomplete texts by filling in blanks, which has attracted more attention recently because of its wide application in ancient text restoration and text rewriting. However, attribute- aware text infilling is yet to be explored, and existing methods seldom focus on the infilling length of each blank or the number/location of blanks. In this paper, we propose an Attribute-aware Text Infilling method via a Pre-trained language model (A-TIP), which contains a text infilling component and a plug- and-play discriminator. Specifically, we first design a unified text infilling component with modified attention mechanisms and intra- and inter-blank positional encoding to better perceive the number of blanks and the infilling length for each blank. Then, we propose a plug-and-play discriminator to guide generation towards the direction of improving attribute relevance without decreasing text fluency. Finally, automatic and human evaluations on three open-source datasets indicate that A-TIP achieves state-of- the-art performance compared with all baselines.

2021

pdf abs
Towards Table-to-Text Generation with Numerical Reasoning
Lya Hulliyyatus Suadaa | Hidetaka Kamigaito | Kotaro Funakoshi | Manabu Okumura | Hiroya Takamura
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recent neural text generation models have shown significant improvement in generating descriptive text from structured data such as table formats. One of the remaining important challenges is generating more analytical descriptions that can be inferred from facts in a data source. The use of a template-based generator and a pointer-generator is among the potential alternatives for table-to-text generators. In this paper, we propose a framework consisting of a pre-trained model and a copy mechanism. The pre-trained models are fine-tuned to produce fluent text that is enriched with numerical reasoning. However, it still lacks fidelity to the table contents. The copy mechanism is incorporated in the fine-tuning step by using general placeholders to avoid producing hallucinated phrases that are not supported by a table while preserving high fluency. In summary, our contributions are (1) a new dataset for numerical table-to-text generation using pairs of a table and a paragraph of a table description with richer inference from scientific papers, and (2) a table-to-text generation framework enriched with numerical reasoning.

pdf abs
Generating Weather Comments from Meteorological Simulations
Soichiro Murakami | Sora Tanaka | Masatsugu Hangyo | Hidetaka Kamigaito | Kotaro Funakoshi | Hiroya Takamura | Manabu Okumura
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The task of generating weather-forecast comments from meteorological simulations has the following requirements: (i) the changes in numerical values for various physical quantities need to be considered, (ii) the weather comments should be dependent on delivery time and area information, and (iii) the comments should provide useful information for users. To meet these requirements, we propose a data-to-text model that incorporates three types of encoders for numerical forecast maps, observation data, and meta-data. We also introduce weather labels representing weather information, such as sunny and rain, for our model to explicitly describe useful information. We conducted automatic and human evaluations. The results indicate that our model performed best against baselines in terms of informativeness. We make our code and data publicly available.

2018

pdf abs
A POS Tagging Model Adapted to Learner English
Ryo Nagata | Tomoya Mizumoto | Yuta Kikuchi | Yoshifumi Kawasaki | Kotaro Funakoshi
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

There has been very limited work on the adaptation of Part-Of-Speech (POS) tagging to learner English despite the fact that POS tagging is widely used in related tasks. In this paper, we explore how we can adapt POS tagging to learner English efficiently and effectively. Based on the discussion of possible causes of POS tagging errors in learner English, we show that deep neural models are particularly suitable for this. Considering the previous findings and the discussion, we introduce the design of our model based on bidirectional Long Short-Term Memory. In addition, we describe how to adapt it to a wide variety of native languages (potentially, hundreds of them). In the evaluation section, we empirically show that it is effective for POS tagging in learner English, achieving an accuracy of 0.964, which significantly outperforms the state-of-the-art POS-tagger. We further investigate the tagging results in detail, revealing which part of the model design does or does not improve the performance.

2016

pdf
Nonparametric Bayesian Models for Spoken Language Understanding
Kei Wakabayashi | Johane Takeuchi | Kotaro Funakoshi | Mikio Nakano
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf abs
The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics
Ryuichiro Higashinaka | Kotaro Funakoshi | Yuka Kobayashi | Michimasa Inaba
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Dialogue breakdown detection is a promising technique in dialogue systems. To promote the research and development of such a technique, we organized a dialogue breakdown detection challenge where the task is to detect a system’s inappropriate utterances that lead to dialogue breakdowns in chat. This paper describes the design, datasets, and evaluation metrics for the challenge as well as the methods and results of the submitted runs of the participants.