Kazutaka Shimada


2023

pdf
Language Preference for Expression of Sentiment for Nepali-English Bilingual Speakers on Social Media
Niraj Pahari | Kazutaka Shimada
Proceedings of the 6th Workshop on Computational Approaches to Linguistic Code-Switching

Nepali-English code-switching (CS) has been a growing phenomenon in Nepalese society, especially in social media. The code-switching text can be leveraged to understand the socio-linguistic behaviours of the multilingual speakers. Existing studies have attempted to identify the language preference of the multilingual speakers for expressing different emotions using text in different language pairs. In this work, we aim to study the language preference of multilingual Nepali-English CS speakers while expressing sentiment in social media. We create a novel dataset for sentiment analysis using the public Nepali-English code-switched comments in YouTube. After performing the statistical study on the dataset, we find that the proportion of use of Nepali language is higher in negative comments when compared with positive comments, hence concluding the preference for using native language while expressing negative sentiment. Machine learning and transformer-based models are used as the baseline models for the dataset for sentiment classification. The dataset is released publicly.

2022

pdf
Annotation and Multi-modal Methods for Quality Assessment of Multi-party Discussion
Tsukasa Shiota | Kazutaka Shimada
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

2021

pdf
Relation Extraction Using Multiple Pre-Training Models in Biomedical Domain
Satoshi Hiai | Kazutaka Shimada | Taiki Watanabe | Akiva Miura | Tomoya Iwakura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The number of biomedical documents is increasing rapidly. Accordingly, a demand for extracting knowledge from large-scale biomedical texts is also increasing. BERT-based models are known for their high performance in various tasks. However, it is often computationally expensive. A high-end GPU environment is not available in many situations. To attain both high accuracy and fast extraction speed, we propose combinations of simpler pre-trained models. Our method outperforms the latest state-of-the-art model and BERT-based models on the GAD corpus. In addition, our method shows approximately three times faster extraction speed than the BERT-based models on the ChemProt corpus and reduces the memory size to one sixth of the BERT ones.

pdf
Discussion Structure Prediction Based on a Two-step Method
Takumi Himeno | Kazutaka Shimada
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Conversations are often held in laboratories and companies. A summary is vital to grasp the content of a discussion for people who did not attend the discussion. If the summary is illustrated as an argument structure, it is helpful to grasp the discussion’s essentials immediately. Our purpose in this paper is to predict a link structure between nodes that consist of utterances in a conversation: classification of each node pair into “linked” or “not-linked.” One approach to predict the structure is to utilize machine learning models. However, the result tends to over-generate links of nodes. To solve this problem, we introduce a two-step method to the structure prediction task. We utilize a machine learning-based approach as the first step: a link prediction task. Then, we apply a score-based approach as the second step: a link selection task. Our two-step methods dramatically improved the accuracy as compared with one-step methods based on SVM and BERT.

pdf
Tell Me What You Read: Automatic Expertise-Based Annotator Assignment for Text Annotation in Expert Domains
Hiyori Yoshikawa | Tomoya Iwakura | Kimi Kaneko | Hiroaki Yoshida | Yasutaka Kumano | Kazutaka Shimada | Rafal Rzepka | Patrycja Swieczkowska
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper investigates the effectiveness of automatic annotator assignment for text annotation in expert domains. In the task of creating high-quality annotated corpora, expert domains often cover multiple sub-domains (e.g. organic and inorganic chemistry in the chemistry domain) either explicitly or implicitly. Therefore, it is crucial to assign annotators to documents relevant with their fine-grained domain expertise. However, most of existing methods for crowdsoucing estimate reliability of each annotator or annotated instance only after the annotation process. To address the issue, we propose a method to estimate the domain expertise of each annotator before the annotation process using information easily available from the annotators beforehand. We propose two measures to estimate the annotator expertise: an explicit measure using the predefined categories of sub-domains, and an implicit measure using distributed representations of the documents. The experimental results on chemical name annotation tasks show that the annotation accuracy improves when both explicit and implicit measures for annotator assignment are combined.

2018

pdf
Trivia Score and Ranking Estimation Using Support Vector Regression and RankNet
Kazuya Niina | Kazutaka Shimada
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf
Annotation and Analysis of Extractive Summaries for the Kyutech Corpus
Takashi Yamamura | Kazutaka Shimada
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
The Kyutech corpus and topic segmentation using a combined method
Takashi Yamamura | Kazutaka Shimada | Shintaro Kawahara
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

Summarization of multi-party conversation is one of the important tasks in natural language processing. In this paper, we explain a Japanese corpus and a topic segmentation task. To the best of our knowledge, the corpus is the first Japanese corpus annotated for summarization tasks and freely available to anyone. We call it “the Kyutech corpus.” The task of the corpus is a decision-making task with four participants and it contains utterances with time information, topic segmentation and reference summaries. As a case study for the corpus, we describe a method combined with LCSeg and TopicTiling for a topic segmentation task. We discuss the effectiveness and the problems of the combined method through the experiment with the Kyutech corpus.

2015

pdf
Trouble information extraction based on a bootstrap approach from Twitter
Kohei Kurihara | Kazutaka Shimada
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf
Multi-aspects Rating Prediction Using Aspect Words and Sentences
Takuto Nakamuta | Kazutaka Shimada
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2010

pdf
Combination of 3 Types of Speech Recognizers for Anaphora Resolution
Kazutaka Shimada | Noriko Tanamachi | Tsutomu Endo
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf
Multi-aspects Review Summarization Based on Identification of Important Opinions and their Similarity
Ryosuke Tadano | Kazutaka Shimada | Tsutomu Endo
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2008

pdf
Sentiment Sentence Extraction Using a Hierarchical Directed Acyclic Graph Structure and a Bootstrap Approach
Kazutaka Shimada | Daigo Hashimoto | Tsutomu Endo
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

pdf
An Effective Speech Understanding Method with a Multiple Speech Recognizer based on Output Selection using Edit Distance
Kazutaka Shimada | Satomi Horiguchi | Tsutomu Endo
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

2007

pdf
Movie Review Classification Based on a Multiple Classifier
Kimitaka Tsutsumi | Kazutaka Shimada | Tsutomu Endo
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation