Koji Mineshima


Compositional Evaluation on Japanese Textual Entailment and Similarity
Hitomi Yanaka | Koji Mineshima
Transactions of the Association for Computational Linguistics, Volume 10

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.

Annotating Japanese Numeral Expressions for a Logical and Pragmatic Inference Dataset
Kana Koyano | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

Numeral expressions in Japanese are characterized by the flexibility of quantifier positions and the variety of numeral suffixes. However, little work has been done to build annotated corpora focusing on these features and datasets for testing the understanding of Japanese numeral expressions. In this study, we build a corpus that annotates each numeral expression in an existing phrase structure-based Japanese treebank with its usage and numeral suffix types. We also construct an inference test set for numerical expressions based on this annotated corpus. In this test set, we particularly pay attention to inferences where the correct label differs between logical entailment and implicature and those contexts such as negations and conditionals where the entailment labels can be reversed. The baseline experiment with Japanese BERT models shows that our inference test set poses challenges for inference involving various types of numeral expressions.


Talking with the Theorem Prover to Interactively Solve Natural Language Inference
Atsushi Sumita | Yusuke Miyao | Koji Mineshima
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form (subject, predicate, object) that can be easily translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language Inference
Hitomi Yanaka | Koji Mineshima
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Despite the success of multilingual pre-trained language models, it remains unclear to what extent these models have human-like generalization capacity across languages. The aim of this study is to investigate the out-of-distribution generalization of pre-trained language models through Natural Language Inference (NLI) in Japanese, the typological properties of which are different from those of English. We introduce a synthetically generated Japanese NLI dataset, called the Japanese Adversarial NLI (JaNLI) dataset, which is inspired by the English HANS dataset and is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Through a series of experiments to evaluate the generalization performance of both Japanese and multilingual BERT models, we demonstrate that there is much room to improve current models trained on Japanese NLI tasks. Furthermore, a comparison of human performance and model performance on the different types of garden-path sentences in the JaNLI dataset shows that structural phenomena that ease interpretation of garden-path sentences for human readers do not help models in the same way, highlighting a difference between human readers and the models.

Exploring Transitivity in Neural NLI Models through Veridicality
Hitomi Yanaka | Koji Mineshima | Kentaro Inui
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Despite the recent success of deep neural networks in natural language processing, the extent to which they can demonstrate human-like generalization capacities for natural language understanding remains unclear. We explore this issue in the domain of natural language inference (NLI), focusing on the transitivity of inference relations, a fundamental property for systematically drawing inferences. A model capturing transitivity can compose basic inference patterns and draw new inferences. We introduce an analysis method using synthetic and naturalistic NLI datasets involving clause-embedding verbs to evaluate whether models can perform transitivity inferences composed of veridical inferences and arbitrary inference types. We find that current NLI models do not perform consistently well on transitivity inference tasks, suggesting that they lack the generalization capacity for drawing composite inferences from provided training examples. The data and code for our analysis are publicly available at https://github.com/verypluming/transitivity.

SyGNS: A Systematic Generalization Testbed Based on Natural Language Semantics
Hitomi Yanaka | Koji Mineshima | Kentaro Inui
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


Combining Event Semantics and Degree Semantics for Natural Language Inference
Izumi Haruta | Koji Mineshima | Daisuke Bekki
Proceedings of the 28th International Conference on Computational Linguistics

In formal semantics, there are two well-developed semantic frameworks: event semantics, which treats verbs and adverbial modifiers using the notion of event, and degree semantics, which analyzes adjectives and comparatives using the notion of degree. However, it is not obvious whether these frameworks can be combined to handle cases in which the phenomena in question are interacting with each other. Here, we study this issue by focusing on natural language inference (NLI). We implement a logic-based NLI system that combines event semantics and degree semantics and their interaction with lexical knowledge. We evaluate the system on various NLI datasets containing linguistically challenging problems. The results show that the system achieves high accuracies on these datasets in comparison with previous logic-based systems and deep-learning-based systems. This suggests that the two semantic frameworks can be combined consistently to handle various combinations of linguistic phenomena without compromising the advantage of either framework.

Development of a General-Purpose Categorial Grammar Treebank
Yusuke Kubota | Koji Mineshima | Noritsugu Hayashi | Shinya Okano
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces ABC Treebank, a general-purpose categorial grammar (CG) treebank for Japanese. It is ‘general-purpose’ in the sense that it is not tailored to a specific variant of CG, but rather aims to offer a theory-neutral linguistic resource (as much as possible) which can be converted to different versions of CG (specifically, CCG and Type-Logical Grammar) relatively easily. In terms of linguistic analysis, it improves over the existing Japanese CG treebank (Japanese CCGBank) on the treatment of certain linguistic phenomena (passives, causatives, and control/raising predicates) for which the lexical specification of the syntactic information reflecting local dependencies turns out to be crucial. In this paper, we describe the underlying ‘theory’ dubbed ABC Grammar that is taken as a basis for our treebank, outline the general construction of the corpus, and report on some preliminary results applying the treebank in a semantic parsing system for generating logical representations of sentences.

Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language?
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Despite the success of language models using neural networks, it remains unclear to what extent neural models have the generalization ability to perform inferences. In this paper, we introduce a method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language, namely, the regularity for performing arbitrary inferences with generalization on composition. We consider four aspects of monotonicity inferences and test whether the models can systematically interpret lexical and logical phenomena on different training/test splits. A series of experiments show that three neural models systematically draw inferences on unseen combinations of lexical and logical phenomena when the syntactic structures of the sentences are similar between the training and test sets. However, the performance of the models significantly decreases when the structures are slightly changed in the test set while retaining all vocabularies and constituents already appearing in the training set. This indicates that the generalization ability of neural models is limited to cases where the syntactic structures are nearly the same as those in the training set.

Logical Inferences with Comparatives and Generalized Quantifiers
Izumi Haruta | Koji Mineshima | Daisuke Bekki
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Comparative constructions pose a challenge in Natural Language Inference (NLI), which is the task of determining whether a text entails a hypothesis. Comparatives are structurally complex in that they interact with other linguistic phenomena such as quantifiers, numerals, and lexical antonyms. In formal semantics, there is a rich body of work on comparatives and gradable expressions using the notion of degree. However, a logical inference system for comparatives has not been sufficiently developed for use in the NLI task. In this paper, we present a compositional semantics that maps various comparative constructions in English to semantic representations via Combinatory Categorial Grammar (CCG) parsers and combine it with an inference system based on automated theorem proving. We evaluate our system on three NLI datasets that contain complex logical inferences with comparatives, generalized quantifiers, and numerals. We show that the system outperforms previous logic-based systems as well as recent deep learning-based models.


Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation
Masashi Yoshikawa | Hiroshi Noji | Koji Mineshima | Daisuke Bekki
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees. Our solution is conceptually simple, and not relying on a specific parser architecture, making it applicable to the current best-performing parsers. We conduct extensive parsing experiments with detailed discussion; on top of existing benchmark datasets on (1) biomedical texts and (2) question sentences, we create experimental datasets of (3) speech conversation and (4) math problems. When applied to the proposed method, an off-the-shelf CCG parser shows significant performance gains, improving from 90.7% to 96.6% on speech conversation, and from 88.5% to 96.8% on math problems.

Multimodal Logical Inference System for Visual-Textual Entailment
Riko Suzuki | Hitomi Yanaka | Masashi Yoshikawa | Koji Mineshima | Daisuke Bekki
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.

pdf bib
Underspecification and interpretive parallelism in Dependent Type Semantics
Yusuke Kubota | Koji Mineshima | Robert Levine | Daisuke Bekki
Proceedings of the IWCS 2019 Workshop on Computing Semantics with Types, Frames and Related Structures

Questions in Dependent Type Semantics
Kazuki Watanabe | Koji Mineshima | Daisuke Bekki
Proceedings of the Sixth Workshop on Natural Language and Computer Science

Dependent Type Semantics (DTS; Bekki and Mineshima, 2017) is a proof-theoretic compositional dynamic semantics based on Dependent Type Theory. The semantic representations for declarative sentences in DTS are types, based on the propositions-as-types paradigm. While type-theoretic semantics for natural language based on dependent type theory has been developed by many authors, how to assign semantic representations to interrogative sentences has been a non-trivial problem. In this study, we show how to provide the semantics of interrogative sentences in DTS. The basic idea is to assign the same type to both declarative sentences and interrogative sentences, partly building on the recent proposal in Inquisitive Semantics. We use Combinatory Categorial Grammar (CCG) as a syntactic component of DTS and implement our compositional semantics for interrogative sentences using ccg2lambda, a semantic parsing platform based on CCG. Based on the idea that the relationship between questions and answers can be formulated as the task of Recognizing Textual Entailment (RTE), we implement our inference system using proof assistant Coq and show that our system can deal with a wide range of question-answer relationships discussed in the formal semantics literature, including those with polar questions, alternative questions, and wh-questions.

Can Neural Networks Understand Monotonicity Reasoning?
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui | Satoshi Sekine | Lasha Abzianidze | Johan Bos
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures. Since no test set has been developed for monotonicity reasoning with wide coverage, it is still unclear whether neural models can perform monotonicity reasoning in a proper way. To investigate this issue, we introduce the Monotonicity Entailment Dataset (MED). Performance by state-of-the-art NLI models on the new test set is substantially worse, under 55%, especially on downward reasoning. In addition, analysis using a monotonicity-driven data augmentation method showed that these models might be limited in their generalization ability in upward and downward reasoning.

HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning
Hitomi Yanaka | Koji Mineshima | Daisuke Bekki | Kentaro Inui | Satoshi Sekine | Lasha Abzianidze | Johan Bos
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, so-called monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obstacle is the size of datasets or the model architectures themselves. To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena. We add it to training data for the state-of-the-art neural models and evaluate them on test sets for monotonicity phenomena. The results showed that our data augmentation improved the overall accuracy. We also find that the improvement is better on monotonicity inferences with lexical replacements than on downward inferences with disjunction and modification. This suggests that some types of inferences can be improved by our data augmentation while others are immune to it.


Neural sentence generation from formal semantics
Kana Manome | Masashi Yoshikawa | Hitomi Yanaka | Pascual Martínez-Gómez | Koji Mineshima | Daisuke Bekki
Proceedings of the 11th International Conference on Natural Language Generation

Sequence-to-sequence models have shown strong performance in a wide range of NLP tasks, yet their applications to sentence generation from logical representations are underdeveloped. In this paper, we present a sequence-to-sequence model for generating sentences from logical meaning representations based on event semantics. We use a semantic parsing system based on Combinatory Categorial Grammar (CCG) to obtain data annotated with logical formulas. We augment our sequence-to-sequence model with masking for predicates to constrain output sentences. We also propose a novel evaluation method for generation using Recognizing Textual Entailment (RTE). Combining parsing and generation, we test whether or not the output sentence entails the original text and vice versa. Experiments showed that our model outperformed a baseline with respect to both BLEU scores and accuracies in RTE.

Acquisition of Phrase Correspondences Using Natural Deduction Proofs
Hitomi Yanaka | Koji Mineshima | Pascual Martínez-Gómez | Daisuke Bekki
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

How to identify, extract, and use phrasal knowledge is a crucial problem for the task of Recognizing Textual Entailment (RTE). To solve this problem, we propose a method for detecting paraphrases via natural deduction proofs of semantic relations between sentence pairs. Our solution relies on a graph reformulation of partial variable unifications and an algorithm that induces subgraph alignments between meaning representations. Experiments show that our method can automatically detect various paraphrases that are absent from existing paraphrase databases. In addition, the detection of paraphrases using proof information improves the accuracy of RTE tasks.

Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning
Masashi Yoshikawa | Koji Mineshima | Hiroshi Noji | Daisuke Bekki
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In formal logic-based approaches to Recognizing Textual Entailment (RTE), a Combinatory Categorial Grammar (CCG) parser is used to parse input premises and hypotheses to obtain their logical formulas. Here, it is important that the parser processes the sentences consistently; failing to recognize the similar syntactic structure results in inconsistent predicate argument structures among them, in which case the succeeding theorem proving is doomed to failure. In this work, we present a simple method to extend an existing CCG parser to parse a set of sentences consistently, which is achieved with an inter-sentence modeling with Markov Random Fields (MRF). When combined with existing logic-based systems, our method always shows improvement in the RTE experiments on English and Japanese languages.


The Challenge of Composition in Distributional and Formal Semantics
Ran Tian | Koji Mineshima | Pascual Martínez-Gómez
Proceedings of the IJCNLP 2017, Tutorial Abstracts

This is tutorial proposal. Abstract is as follows: The principle of compositionality states that the meaning of a complete sentence must be explained in terms of the meanings of its subsentential parts; in other words, each syntactic operation should have a corresponding semantic operation. In recent years, it has been increasingly evident that distributional and formal semantics are complementary in addressing composition; while the distributional/vector-based approach can naturally measure semantic similarity (Mitchell and Lapata, 2010), the formal/symbolic approach has a long tradition within logic-based semantic frameworks (Montague, 1974) and can readily be connected to theorem provers or databases to perform complicated tasks. In this tutorial, we will cover recent efforts in extending word vectors to account for composition and reasoning, the various challenging phenomena observed in composition and addressed by formal semantics, and a hybrid approach that combines merits of the two. Outline and introduction to instructors are found in the submission. Ran Tian has taught a tutorial at the Annual Meeting of the Association for Natural Language Processing in Japan, 2015. The estimated audience size was about one hundred. Only a limited part of the contents in this tutorial is drawn from the previous one. Koji Mineshima has taught a one-week course at the 28th European Summer School in Logic, Language and Information (ESSLLI2016), together with Prof. Daisuke Bekki. Only a few contents are the same with this tutorial. Tutorials on “CCG Semantic Parsing” have been given in ACL2013, EMNLP2014, and AAAI2015. A coming tutorial on “Deep Learning for Semantic Composition” will be given in ACL2017. Contents in these tutorials are somehow related to but not overlapping with our proposal.

On-demand Injection of Lexical Knowledge for Recognising Textual Entailment
Pascual Martínez-Gómez | Koji Mineshima | Yusuke Miyao | Daisuke Bekki
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We approach the recognition of textual entailment using logical semantic representations and a theorem prover. In this setup, lexical divergences that preserve semantic entailment between the source and target texts need to be explicitly stated. However, recognising subsentential semantic relations is not trivial. We address this problem by monitoring the proof of the theorem and detecting unprovable sub-goals that share predicate arguments with logical premises. If a linguistic relation exists, then an appropriate axiom is constructed on-demand and the theorem proving continues. Experiments show that this approach is effective and precise, producing a system that outperforms other logic-based systems and is competitive with state-of-the-art statistical methods.

Determining Semantic Textual Similarity using Natural Deduction Proofs
Hitomi Yanaka | Koji Mineshima | Pascual Martínez-Gómez | Daisuke Bekki
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Determining semantic textual similarity is a core research subject in natural language processing. Since vector-based models for sentence representation often use shallow information, capturing accurate semantics is difficult. By contrast, logical semantic representations capture deeper levels of sentence semantics, but their symbolic nature does not offer graded notions of textual similarity. We propose a method for determining semantic textual similarity by combining shallow features with features extracted from natural deduction proofs of bidirectional entailment relations between sentence pairs. For the natural deduction proofs, we use ccg2lambda, a higher-order automatic inference system, which converts Combinatory Categorial Grammar (CCG) derivation trees into semantic representations and conducts natural deduction proofs. Experiments show that our system was able to outperform other logic-based systems and that features derived from the proofs are effective for learning textual similarity.

Visual Denotations for Recognizing Textual Entailment
Dan Han | Pascual Martínez-Gómez | Koji Mineshima
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In the logic approach to Recognizing Textual Entailment, identifying phrase-to-phrase semantic relations is still an unsolved problem. Resources such as the Paraphrase Database offer limited coverage despite their large size whereas unsupervised distributional models of meaning often fail to recognize phrasal entailments. We propose to map phrases to their visual denotations and compare their meaning in terms of their images. We show that our approach is effective in the task of Recognizing Textual Entailment when combined with specific linguistic and logic features.


pdf bib
Annotation and Analysis of Discourse Relations, Temporal Relations and Multi-Layered Situational Relations in Japanese Texts
Kimi Kaneko | Saku Sugawara | Koji Mineshima | Daisuke Bekki
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper proposes a methodology for building a specialized Japanese data set for recognizing temporal relations and discourse relations. In addition to temporal and discourse relations, multi-layered situational relations that distinguish generic and specific states belonging to different layers in a discourse are annotated. Our methodology has been applied to 170 text fragments taken from Wikinews articles in Japanese. The validity of our methodology is evaluated and analyzed in terms of degree of annotator agreement and frequency of errors.

ccg2lambda: A Compositional Semantics System
Pascual Martínez-Gómez | Koji Mineshima | Yusuke Miyao | Daisuke Bekki
Proceedings of ACL-2016 System Demonstrations

Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Koji Mineshima | Ribeka Tanaka | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


Higher-order logical inference with compositional semantics
Koji Mineshima | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing