Tetsuya Sakai


Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization
Yuji Naraki | Tetsuya Sakai | Yoshihiko Hayashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Automatic dialogue summarization is a task used to succinctly summarize a dialogue transcript while correctly linking the speakers and their speech, which distinguishes this task from a conventional document summarization. To address this issue and reduce the “who said what”-related errors in a summary, we propose embedding the speaker identity information in the input embedding into the dialogue transcript encoder. Unlike the speaker embedding proposed by Gu et al. (2020), our proposal takes into account the informativeness of position embedding. By experimentally comparing several embedding methods, we confirmed that the scores of ROUGE and a human evaluation of the generated summaries were substantially increased by embedding speaker information at the less informative part of the fixed position embedding with sinusoidal functions.

LayerConnect: Hypernetwork-Assisted Inter-Layer Connector to Enhance Parameter Efficiency
Haoxiang Shi | Rongsheng Zhang | Jiaan Wang | Cen Wang | Yinhe Zheng | Tetsuya Sakai
Proceedings of the 29th International Conference on Computational Linguistics

Pre-trained Language Models (PLMs) are the cornerstone of the modern Natural Language Processing (NLP). However, as PLMs become heavier, fine tuning all their parameters loses their efficiency. Existing parameter-efficient methods generally focus on reducing the trainable parameters in PLMs but neglect the inference speed, which limits the ability to deploy PLMs. In this paper, we propose LayerConnect (hypernetwork-assisted inter-layer connectors) to enhance inference efficiency. Specifically, a light-weight connector with a linear structure is inserted between two Transformer layers, and the parameters inside each connector are tuned by a hypernetwork comprising an interpolator and a down-sampler. We perform extensive experiments on the widely used the GLUE benchmark. The experimental results verify the inference efficiency of our model. Compared to Adapter, our model parameters are reduced to approximately 11.75%, while the performance degradation is kept to less than 5% (2.5 points on average).

Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Ping Yang | Junjie Wang | Ruyi Gan | Xinyu Zhu | Lin Zhang | Ziwei Wu | Xinyu Gao | Jiaxing Zhang | Tetsuya Sakai
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We propose a new paradigm for zero-shot learners that is format agnostic, i.e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis. Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training. Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN. It not only adds generalization ability to models but also significantly reduces the number of parameters. Our method shares the merits of efficient training and deployment. Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification. Our model achieves this success with only 235M parameters, which is substantially smaller than state-of-the-art models with billions of parameters. The code and pre-trained models are available at https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/unimc .


Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification
Tetsuya Sakai
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Ordinal Classification (OC) is an important classification task where the classes are ordinal. For example, an OC task for sentiment analysis could have the following classes: highly positive, positive, neutral, negative, highly negative. Clearly, evaluation measures for an OC task should penalise misclassifications by considering the ordinal nature of the classes. Ordinal Quantification (OQ) is a related task where the gold data is a distribution over ordinal classes, and the system is required to estimate this distribution. Evaluation measures for an OQ task should also take the ordinal nature of the classes into account. However, for both OC and OQ, there are only a small number of known evaluation measures that meet this basic requirement. In the present study, we utilise data from the SemEval and NTCIR communities to clarify the properties of nine evaluation measures in the context of OC tasks, and six measures in the context of OQ tasks.

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering
Junjie Wang | Yatai Ji | Jiaqi Sun | Yujiu Yang | Tetsuya Sakai
Findings of the Association for Computational Linguistics: EMNLP 2021

In Visual Question Answering (VQA), existing bilinear methods focus on the interaction between images and questions. As a result, the answers are either spliced into the questions or utilized as labels only for classification. On the other hand, trilinear models such as the CTI model efficiently utilize the inter-modality information between answers, questions, and images, while ignoring intra-modality information. Inspired by this observation, we propose a new trilinear interaction framework called MIRTT (Learning Multimodal Interaction Representations from Trilinear Transformers), incorporating the attention mechanisms for capturing inter-modality and intra-modality relationships. Moreover, we design a two-stage workflow where a bilinear model reduces the free-form, open-ended VQA problem into a multiple-choice VQA problem. Furthermore, to obtain accurate and generic multimodal representations, we pre-train MIRTT with masked language prediction. Our method achieves state-of-the-art performance on the Visual7W Telling task and VQA-1.0 Multiple Choice task and outperforms bilinear baselines on the VQA-2.0, TDIUC and GQA datasets.


A Siamese CNN Architecture for Learning Chinese Sentence Similarity
Haoxiang Shi | Cen Wang | Tetsuya Sakai
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop

This paper presents a deep neural architecture which applies the siamese convolutional neural network sharing model parameters for learning a semantic similarity metric between two sentences. In addition, two different similarity metrics (i.e., the Cosine Similarity and Manhattan similarity) are compared based on this architecture. Our experiments in binary similarity classification for Chinese sentence pairs show that the proposed siamese convolutional architecture with Manhattan similarity outperforms the baselines (i.e., the siamese Long Short-Term Memory architecture and the siamese Bidirectional Long Short-Term Memory architecture) by 8.7 points in accuracy.


pdf bib
Composing a Picture Book by Automatic Story Understanding and Visualization
Xiaoyu Qi | Ruihua Song | Chunting Wang | Jin Zhou | Tetsuya Sakai
Proceedings of the Second Workshop on Storytelling

Pictures can enrich storytelling experiences. We propose a framework that can automatically compose a picture book by understanding story text and visualizing it with painting elements, i.e., characters and backgrounds. For story understanding, we extract key information from a story on both sentence level and paragraph level, including characters, scenes and actions. These concepts are organized and visualized in a way that depicts the development of a story. We collect a set of Chinese stories for children and apply our approach to compose pictures for stories. Extensive experiments are conducted towards story event extraction for visualization to demonstrate the effectiveness of our method.


Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
Hajime Morita | Tetsuya Sakai | Manabu Okumura
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies


pdf bib
Multiliguality at NTCIR, and moving on ...
Tetsuya Sakai
Proceedings of the 4th Workshop on Cross Lingual Information Access


BRIDJE over a Language Barrier: Cross-Language Information Access by Integrating Translation and Retrieval
Tetsuya Sakai | Makoto Koyama | Masaru Suzuki | Akira Kumano | Toshihiko Manabe
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages