Hirokazu Kiyomaru


Contextualized and Generalized Sentence Representations by Contrastive Self-Supervised Learning: A Case Study on Discourse Relation Analysis
Hirokazu Kiyomaru | Sadao Kurohashi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose a method to learn contextualized and generalized sentence representations using contrastive self-supervised learning. In the proposed method, a model is given a text consisting of multiple sentences. One sentence is randomly selected as a target sentence. The model is trained to maximize the similarity between the representation of the target sentence with its context and that of the masked target sentence with the same context. Simultaneously, the model minimizes the similarity between the latter representation and the representation of a random sentence with the same context. We apply our method to discourse relation analysis in English and Japanese and show that it outperforms strong baseline methods based on BERT, XLNet, and RoBERTa.


A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.


Diversity-aware Event Prediction based on a Conditional Variational Autoencoder with Reconstruction
Hirokazu Kiyomaru | Kazumasa Omura | Yugo Murawaki | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

Typical event sequences are an important class of commonsense knowledge. Formalizing the task as the generation of a next event conditioned on a current event, previous work in event prediction employs sequence-to-sequence (seq2seq) models. However, what can happen after a given event is usually diverse, a fact that can hardly be captured by deterministic models. In this paper, we propose to incorporate a conditional variational autoencoder (CVAE) into seq2seq for its ability to represent diverse next events as a probabilistic distribution. We further extend the CVAE-based seq2seq with a reconstruction mechanism to prevent the model from concentrating on highly typical events. To facilitate fair and systematic evaluation of the diversity-aware models, we also extend existing evaluation datasets by tying each current event to multiple next events. Experiments show that the CVAE-based models drastically outperform deterministic models in terms of precision and that the reconstruction mechanism improves the recall of CVAE-based models without sacrificing precision.


Cross-lingual Knowledge Projection Using Machine Translation and Target-side Knowledge Base Completion
Naoki Otani | Hirokazu Kiyomaru | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the 27th International Conference on Computational Linguistics

Considerable effort has been devoted to building commonsense knowledge bases. However, they are not available in many languages because the construction of KBs is expensive. To bridge the gap between languages, this paper addresses the problem of projecting the knowledge in English, a resource-rich language, into other languages, where the main challenge lies in projection ambiguity. This ambiguity is partially solved by machine translation and target-side knowledge base completion, but neither of them is adequately reliable by itself. We show their combination can project English commonsense knowledge into Japanese and Chinese with high precision. Our method also achieves a top-10 accuracy of 90% on the crowdsourced English–Japanese benchmark. Furthermore, we use our method to obtain 18,747 facts of accurate Japanese commonsense within a very short period.