Noriko Kando


2025

In psychological practice, standardized questionnaires serve as essential tools for assessing mental health through structured, clinically-validated questions (i.e., items). While social media platforms offer rich data for mental health screening, computational approaches often bypass these established clinical assessment tools in favor of black-box classification. We propose a novel questionnaire-guided screening framework that bridges psychological practice and computational methods through adaptive Retrieval-Augmented Generation (aRAG). Our approach links unstructured social media content and standardized clinical assessments by retrieving relevant posts for each questionnaire item and using Large Language Models (LLMs) to complete validated psychological instruments. Our findings demonstrate two key advantages of questionnaire-guided screening: First, when completing the Beck Depression Inventory-II (BDI-II), our approach matches or outperforms state-of-the-art performance on Reddit-based benchmarks without requiring training data. Second, we show that guiding LLMs through standardized questionnaires yields superior results compared to directly prompting them for depression screening. Additionally, we show as a proof-of-concept how our questionnaire-based methodology successfully extends to self-harm screening.

2024

Argument mining has typically been researched for specific corpora belonging to concrete languages and domains independently in each research work. Human argumentation, however, has domain- and language-dependent linguistic features that determine the content and structure of arguments. Also, when deploying argument mining systems in the wild, we might not be able to control some of these features. Therefore, an important aspect that has not been thoroughly investigated in the argument mining literature is the robustness of such systems to variations in language and domain. In this paper, we present a complete analysis across three different languages and three different domains that allow us to have a better understanding on how to leverage the scarce available corpora to design argument mining systems that are more robust to natural language variations.
Survey research using open-ended responses is an important method thatcontributes to the discovery of unknown issues and new needs. However,survey research generally requires time and cost-consuming manual dataprocessing, indicating that it is difficult to analyze large dataset.To address this issue, we propose an LLM-based method to automate partsof the grounded theory approach (GTA), a representative approach of thequalitative data analysis. We generated and annotated pseudo open-endedresponses, and used them as the training data for the coding proceduresof GTA. Through evaluations, we showed that the models trained withpseudo open-ended responses are quite effective compared with thosetrained with manually annotated open-ended responses. We alsodemonstrate that the LLM-based approach is highly efficient andcost-saving compared to human-based approach.

2020

In this study, we construct a corpus of Japanese local assembly minutes. All speeches in an assembly were transcribed into a local assembly minutes based on the local autonomy law. Therefore, the local assembly minutes form an extremely large amount of text data. Our ultimate objectives were to summarize and present the arguments in the assemblies, and to use the minutes as primary information for arguments in local politics. To achieve this, we structured all statements in assembly minutes. We focused on the structure of the discussion, i.e., the extraction of question and answer pairs. We organized the shared task “QA Lab-PoliInfo” in NTCIR 14. We conducted a “segmentation task” to identify the scope of one question and answer in the minutes as a sub task of the shared task. For the segmentation task, 24 runs from five teams were submitted. Based on the obtained results, the best recall was 1.000, best precision was 0.940, and best F-measure was 0.895.

2019

Detecting opinion expression is a potential and essential task in opinion mining that can be extended to advanced tasks. In this paper, we considered opinion expression detection as a sequence labeling task and exploited different deep contextualized embedders into the state-of-the-art architecture, composed of bidirectional long short-term memory (BiLSTM) and conditional random field (CRF). Our experimental results show that using different word embeddings can cause contrasting results, and the model can achieve remarkable scores with deep contextualized embeddings. Especially, using BERT embedder can significantly exceed using ELMo embedder.

2018

Search engine is an important tool of modern academic study, but the results are lack of measurement of beginner friendliness. In order to improve the efficiency of using search engine for academic study, it is necessary to invent a technique of measuring the beginner friendliness of a Web page explaining academic concepts and to build an automatic measurement system. This paper studies how to integrate heterogeneous features such as a neural image feature generated from the image of the Web page by a variant of CNN (convolutional neural network) as well as text features extracted from the body text of the HTML file of the Web page. Integration is performed through the framework of the SVM classifier learning. Evaluation results show that heterogeneous features perform better than each individual type of features.

2016

2013

2012

2010

2009

2008

In this paper we present a Japanese-English Bilingual lexicon of technical terms. The lexicon was derived from the first and second NTCIR evaluation collections for research into cross-language information retrieval for Asian languages. While it can be utilized for translation between Japanese and English, the lexicon is also suitable for language research and language engineering. Since it is collection-derived, it contains instances of word variants and miss-spellings which make it eminently suitable for further research. For a subset of the lexicon we make available the collection statistics. In addition we make available a Katakana subset suitable for transliteration research.

2006

This paper describes the test collections produced for the Patent Retrieval Task in the Fifth NTCIR Workshop. We performed the invalidity search task, in which each participant group searches a patent collection for the patents that can invalidate the demand in an existing claim. For this purpose, we performed both document and passage retrieval tasks. We also performed the automatic patent classification task using the F-term classification system. The test collections will be available to the public for research purposes.

2004

2003

2002