Minoru Sasaki

2022

pdf
Text Classification Using a Graph Based on Relationships Between Documents
Hiromu Nakajima | Minoru Sasaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf
Reputation Analysis Using Key Phrases and Sentiment Scores Extracted from Reviews
Yipu Huang | Minoru Sasaki | Kanako Komiya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf
Effectiveness Analysis of Word Sense Disambiguation Using Example of Word Senses from WordNet
Hiroshi Sekiya | Minoru Sasaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf
Effective Use of Japanese Dictionary Definition Sentences in Learning Hierarchical Embedding of Dictionaries
Yuki Ishii | Minoru Sasaki
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf abs
Budget Argument Mining Dataset Using Japanese Minutes from the National Diet and Local Assemblies
Yasutomo Kimura | Hokuto Ototake | Minoru Sasaki
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Budget argument mining attempts to identify argumentative components related to a budget item, and then classifies these argumentative components, given budget information and minutes. We describe the construction of the dataset for budget argument mining, a subtask of QA Lab-PoliInfo-3 in NTCIR-16. Budget argument mining analyses the argument structure of the minutes, focusing on monetary expressions (amount of money). In this task, given sufficient budget information (budget item, budget amount, etc.), relevant argumentative components in the minutes are identified and argument labels (claim, premise, and other) are assigned their components. In this paper, we describe the design of the data format, the annotation procedure, and release information of budget argument mining dataset, to link budget information to minutes.

2020

pdf abs
Semi-supervised Word Sense Disambiguation Using Example Similarity Graph
Rie Yatabe | Minoru Sasaki
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

Word Sense Disambiguation (WSD) is a well-known problem in the natural language processing. In recent years, there has been increasing interest in applying neural net-works and machine learning techniques to solve WSD problems. However, these previ-ous supervised approaches often suffer from the lack of manually sense-tagged exam-ples. In this paper, to solve these problems, we propose a semi-supervised WSD method using graph embeddings based learning method in order to make effective use of labeled and unlabeled examples. The results of the experiments show that the proposed method performs better than the previous semi-supervised WSD method. Moreover, the graph structure between examples is effective for WSD and it is effective to utilize a graph structure obtained by fine-tuning BERT in the proposed method.

For natural language processing on machines, resolving such peculiar usages would be particularly useful in constructing a dictionary and dataset for word sense disambiguation. Hence, it is necessary to develop a method to detect such peculiar examples of a target word from a corpus. Note that, hereinafter, we define a peculiar example as an instance in which the target word or phrase has a new meaning. In this paper, we proposed a new peculiar example detection method using distance metric learning from labeled example pairs. In this method, first, distance metric learning is performed by large margin nearest neighbor classification for the training data, and new training data points are generated using the distance metric in the original space. Then, peculiar examples are extracted using the local outlier factor, which is a density-based outlier detection method, from the updated training and test data. The efficiency of the proposed method was evaluated on an artificial dataset and the Semeval-2010 Japanese WSD task dataset. The results showed that the proposed method has the highest number of properly detected instances and the highest F-measure value. This shows that the label information of training data is effective for density-based peculiar example detection. Moreover, an experiment on outlier detection using a classification method such as SVM showed that it is difficult to apply the classification method to outlier detection.

2010

pdf abs
Detection of Peculiar Examples using LOF and One Class SVM
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes the method to detect peculiar examples of the target word from a corpus. In this paper we regard following examples as peculiar examples: (1) a meaning of the target word in the example is new, (2) a compound word consisting of the target word in the example is new or very technical. The peculiar example is regarded as an outlier in the given example set. Therefore we can apply many methods proposed in the data mining domain to our task. In this paper, we propose the method to combine the density based method, Local Outlier Factor (LOF), and One Class SVM, which are representative outlier detection methods in the data mining domain. In the experiment, we use the Whitepaper text in BCCWJ as the corpus, and 10 noun words as target words. Our method improved precision and recall of LOF and One Class SVM. And we show that our method can detect new meanings by using the noun `midori (green)'. The main reason of un-detections and wrong detection is that similarity measure of two examples is inadequacy. In future, we must improve it.

2008

pdf abs
Ping-pong Document Clustering using NMF and Linkage-Based Refinement
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes a ping-pong document clustering method using NMF and the linkage based refinement alternately, in order to improve the clustering result of NMF. The use of NMF in the ping-pong strategy can be expected effective for document clustering. However, NMF in the ping-pong strategy often worsens performance because NMF often fails to improve the clustering result given as the initial values. Our method handles this problem with the stop condition of the ping-pong process. In the experiment, we compared our method with the k-means and NMF by using 16 document data sets. Our method improved the clustering result of NMF significantly.

pdf abs
Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the similarity matrix corresponding to the data set. Therefore, it is not practical to use spectral clustering for a large data set. To overcome this problem, we propose the method to reduce the similarity matrix size. First, using k-means, we obtain a clustering result for the given data set. From each cluster, we pick up some data, which are near to the central of the cluster. We take these data as one data. We call this data set as committee. Data except for committees remain one data. For these data, we construct the similarity matrix. Definitely, the size of this similarity matrix is reduced so much that we can perform spectral clustering using the reduced similarity matrix.

pdf abs
Division of Example Sentences Based on the Meaning of a Target Word Using Semi-Supervised Clustering
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe a system that divides example sentences (data set) into clusters, based on the meaning of the target word, using a semi-supervised clustering technique. In this task, the estimation of the cluster number (the number of the meaning) is critical. Our system primarily concentrates on this aspect. First, a user assigns the system an initial cluster number for the target word. The system then performs general clustering on the data set to obtain small clusters. Next, using constraints given by the user, the system integrates these clusters to obtain the final clustering result. Our system performs this entire procedure with high precision and requiring only a few constraints. In the experiment, we tested the system for 12 Japanese nouns used in the SENSEVAL2 Japanese dictionary task. The experiment proved the effectiveness of our system. In the future, we will improve sentence similarity measurements.

2007

pdf
Refinement of Document Clustering by Using NMF
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

pdf
Ensemble document clustering using weighted hypergraph generated by NMF
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2004

pdf
Semi-supervised Learning by Fuzzy Clustering and Ensemble Learning
Hiroyuki Shinnou | Minoru Sasaki
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf abs
Information Retrieval System Using Latent Contextual Relevance
Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

When the relevance feedback, which is one of the most popular information retrieval model, is used in an information retrieval system, a related word is extracted based on the first retrival result. Then these words are added into the original query, and retrieval is performed again using updated query. Generally, Using such query expansion technique, retrieval performance using the query expansion falls in comparison with the performance using the original query. As the cause, there is a few synonyms in the thesaurus and although some synonyms are added to the query, the same documents are retireved as a result. In this paper, to solve the problem over such related words, we propose latent context relevance in consideration of the relevance between query and each index words in the document set.