Ribeka Tanaka


2022

pdf
Constructing a Culinary Interview Dialogue Corpus with Video Conferencing Tool
Taro Okahisa | Ribeka Tanaka | Takashi Kodama | Yin Jou Huang | Sadao Kurohashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Interview is an efficient way to elicit knowledge from experts of different domains. In this paper, we introduce CIDC, an interview dialogue corpus in the culinary domain in which interviewers play an active role to elicit culinary knowledge from the cooking expert. The corpus consists of 308 interview dialogues (each about 13 minutes in length), which add up to a total of 69,000 utterances. We use a video conferencing tool for data collection, which allows us to obtain the facial expressions of the interlocutors as well as the screen-sharing contents. To understand the impact of the interlocutors’ skill level, we divide the experts into “semi-professionals’” and “enthusiasts” and the interviewers into “skilled interviewers” and “unskilled interviewers.” For quantitative analysis, we report the statistics and the results of the post-interview questionnaire. We also conduct qualitative analysis on the collected interview dialogues and summarize the salient patterns of how interviewers elicit knowledge from the experts. The corpus serves the purpose to facilitate future research on the knowledge elicitation mechanism in interview dialogues.

pdf
JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation
Fei Cheng | Shuntaro Yada | Ribeka Tanaka | Eiji Aramaki | Sadao Kurohashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In the field of Japanese medical information extraction, few analyzing tools are available and relation extraction is still an under-explored topic. In this paper, we first propose a novel relation annotation schema for investigating the medical and temporal relations between medical entities in Japanese medical reports. We experiment with the practical annotation scenarios by separately annotating two different types of reports. We design a pipeline system with three components for recognizing medical entities, classifying entity modalities, and extracting relations. The empirical results show accurate analyzing performance and suggest the satisfactory annotation quality, the superiority of the latest contextual embedding models. and the feasible annotation strategy for high-accuracy demand.

pdf
Construction of Hierarchical Structured Knowledge-based Recommendation Dialogue Dataset and Dialogue System
Takashi Kodama | Ribeka Tanaka | Sadao Kurohashi
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

We work on a recommendation dialogue system to help a user understand the appealing points of some target (e.g., a movie). In such dialogues, the recommendation system needs to utilize structured external knowledge to make informative and detailed recommendations. However, there is no dialogue dataset with structured external knowledge designed to make detailed recommendations for the target. Therefore, we construct a dialogue dataset, Japanese Movie Recommendation Dialogue (JMRD), in which the recommender recommends one movie in a long dialogue (23 turns on average). The external knowledge used in this dataset is hierarchically structured, including title, casts, reviews, and plots. Every recommender’s utterance is associated with the external knowledge related to the utterance. We then create a movie recommendation dialogue system that considers the structure of the external knowledge and the history of the knowledge used. Experimental results show that the proposed model is superior in knowledge selection to the baseline models.

pdf
Learning Knowledge with Neural DTS
Daisuke Bekki | Ribeka Tanaka | Yuta Takahashi
Proceedings of the 3rd Natural Logic Meets Machine Learning Workshop (NALOMA III)

2020

pdf
A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

pdf
Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases
Shuntaro Yada | Ayami Joh | Ribeka Tanaka | Fei Cheng | Eiji Aramaki | Sadao Kurohashi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Applying natural language processing (NLP) to medical and clinical texts can bring important social benefits by mining valuable information from unstructured text. A popular application for that purpose is named entity recognition (NER), but the annotation policies of existing clinical corpora have not been standardized across clinical texts of different types. This paper presents an annotation guideline aimed at covering medical documents of various types such as radiography interpretation reports and medical records. Furthermore, the annotation was designed to avoid burdensome requirements related to medical knowledge, thereby enabling corpus development without medical specialists. To achieve these design features, we specifically focus on critical lung diseases to stabilize linguistic patterns in corpora. After annotating around 1100 electronic medical records following the annotation scheme, we demonstrated its feasibility using an NER task. Results suggest that our guideline is applicable to large-scale clinical NLP projects.

2016

pdf
Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Koji Mineshima | Ribeka Tanaka | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing