Makoto Miwa

2025

pdf bib abs
Enhancing NER by Harnessing Multiple Datasets with Conditional Variational Autoencoders
Taku Oi | Makoto Miwa
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a novel method to integrate a Conditional Variational Autoencoder (CVAE) into a span-based Named Entity Recognition (NER) model to model the shared and unshared information among labels in multiple datasets and ease the training on the datasets. Experimental results using multiple biomedical datasets show the effectiveness of the proposed method, achieving improved performance on the BioRED dataset.

pdf bib
Proceedings of the 24th Workshop on Biomedical Language Processing
Dina Demner-Fushman | Sophia Ananiadou | Makoto Miwa | Junichi Tsujii
Proceedings of the 24th Workshop on Biomedical Language Processing

pdf bib abs
Effect of Multilingual and Domain-adapted Continual Pre-training on Few-shot Promptability
Ken Yano | Makoto Miwa
Proceedings of the 24th Workshop on Biomedical Language Processing

Continual Pre-training (CPT) can help pre-trained large language models (LLMs) effectively adapt to new or under-trained domains or low-resource languages without re-training from scratch.Nevertheless, during CPT, the model’s few-shot transfer ability is known to be affected for emergent tasks.We verified this by comparing the performance between the few-shot and fine-tuning settings on the same tasks.We used Llama3-ELAINE-medLLM, which was continually pre-trained on Llama3-8B, targeted for the biomedical domain, and adapted for multilingual languages (English, Japanese, and Chinese).We compared the performance of Llama3-ELAINE-medLLM and Llama3-8B in three emergent tasks: two related domain tasks, entity recognition (NER) and machine translation (MT), and one out-of-domain task, summarization (SUM). Our experimental results show that degradation in few-shot transfer ability does not necessarily indicate the model’s underlying potential on the same task after fine-tuning.

We propose ELAINE (EngLish-jApanese-chINesE)-medLLM, a trilingual (English, Japanese, Chinese) large language model adapted for the bio-medical domain based on Llama-3-8B. The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model. The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT). Our results demonstrate that ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model’s capability.

pdf bib abs
Improving Relation Extraction by Sequence-to-sequence-based Dependency Parsing Pre-training
Masaki Asada | Makoto Miwa
Proceedings of the 31st International Conference on Computational Linguistics

Relation extraction is a crucial natural language processing task that extracts relational triplets from raw text. Syntactic dependencies information has shown its effectiveness for relation extraction tasks. However, in most existing studies, dependency information is used only for traditional encoder-only-based relation extraction, not for generative sequence-to-sequence (seq2seq)-based relation extraction. In this study, we propose a syntax-aware seq2seq pre-trained model for seq2seq-based relation extraction. The model incorporates dependency information into a seq2seq pre-trained language model by continual pre-training with a seq2seq-based dependency parsing task. Experimental results on two widely used relation extraction benchmark datasets show that dependency parsing pre-training can improve the relation extraction performance.

pdf bib abs
Addressing the Training-Inference Discrepancy in Discrete Diffusion for Text Generation
Masaki Asada | Makoto Miwa
Proceedings of the 31st International Conference on Computational Linguistics

This study addresses the discrepancy between training and inference in discrete diffusion models for text generation. We propose two novel strategies: (1) a training schema that considers two-step diffusion processes, allowing the model to use its own predicted output as input for subsequent steps during training and (2) a scheduling technique that gradually increases the probability of using self-generated text as training progresses. Experiments conducted on four widely used text generation benchmark datasets demonstrate that both proposed strategies improve the performance of discrete diffusion models in text generation.

2024

pdf bib
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Dina Demner-Fushman | Sophia Ananiadou | Makoto Miwa | Kirk Roberts | Junichi Tsujii
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

pdf bib abs
Entity-aware Multi-task Training Helps Rare Word Machine Translation
Matiss Rikters | Makoto Miwa
Proceedings of the 17th International Natural Language Generation Conference

Named entities (NE) are integral for preserving context and conveying accurate information in the machine translation (MT) task. Challenges often lie in handling NE diversity, ambiguity, rarity, and ensuring alignment and consistency. In this paper, we explore the effect of NE-aware model fine-tuning to improve handling of NEs in MT. We generate data for NE recognition (NER) and NE-aware MT using common NER tools from Spacy, and align entities in parallel data. Experiments with fine-tuning variations of pre-trained T5 models on NE-related generation tasks between English and German show promising results with increasing amounts of NEs in the output and BLEU score improvements compared to the non-tuned baselines.

pdf bib
Enhancing Syllabic Component Classification in Japanese Sign Language by Pre-training on Non-Japanese Sign Language Data
Jundai Inoue | Makoto Miwa | Yutaka Sasaki | Daisuke Hara
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources

pdf bib abs
AIST AIRC Systems for the WMT 2024 Shared Tasks
Matiss Rikters | Makoto Miwa
Proceedings of the Ninth Conference on Machine Translation

At WMT 2024 AIST AIRC participated in the General Machine Translation shared task and the Biomedical Translation task. We trained constrained track models for translation between English, German, and Japanese. Before training the final models, we first filtered the parallel data, then performed iterative back-translation as well as parallel data distillation. We experimented with training baseline Transformer models, Mega models, and fine-tuning open-source T5 and Gemma model checkpoints using the filtered parallel data. Our primary submissions contain translations from ensembles of two Mega model checkpoints and our contrastive submissions are generated by our fine-tuned T5 model checkpoints.

2023

pdf bib abs
DISTANT: Distantly Supervised Entity Span Detection and Classification
Ken Yano | Makoto Miwa | Sophia Ananiadou
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We propose a distantly supervised pipeline NER which executes entity span detection and entity classification in sequence named DISTANT (DIstantly Supervised enTity spAN deTection and classification).The former entity span detector extracts possible entity mention spans by the distant supervision. Then the later entity classifier assigns each entity span to one of the positive entity types or none by employing a positive and unlabeled (PU) learning framework. Two models were built based on the pre-trained SciBERT model and fine-tuned with the silver corpus generated by the distant supervision. Experimental results on BC5CDR and NCBI-Disease datasets show that our method outperforms the end-to-end NER baselines without PU learning by a large margin. In particular, it increases the recall score effectively.

pdf bib abs
Distantly Supervised Document-Level Biomedical Relation Extraction with Neighborhood Knowledge Graphs
Takuma Matsubara | Makoto Miwa | Yutaka Sasaki
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We propose a novel distantly supervised document-level biomedical relation extraction model that uses partial knowledge graphs that include the graph neighborhood of the entities appearing in each input document. Most conventional distantly supervised relation extraction methods use only the entity relations automatically annotated by using knowledge base entries. They do not fully utilize the rich information in the knowledge base, such as entities other than the target entities and the network of heterogeneous entities defined in the knowledge base. To address this issue, our model integrates the representations of the entities acquired from the neighborhood knowledge graphs with the representations of the input document. We conducted experiments on the ChemDisGene dataset using Comparative Toxicogenomics Database (CTD) for document-level relation extraction with respect to interactions between drugs, diseases, and genes. Experimental results confirmed the performance improvement by integrating entities and their neighborhood biochemical information from the knowledge base.

pdf bib abs
BioNART: A Biomedical Non-AutoRegressive Transformer for Natural Language Generation
Masaki Asada | Makoto Miwa
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We propose a novel Biomedical domain-specific Non-AutoRegressive Transformer model for natural language generation: BioNART. Our BioNART is based on an encoder-decoder model, and both encoder and decoder are compatible with widely used BERT architecture, which allows benefiting from publicly available pre-trained biomedical language model checkpoints. We performed additional pre-training and fine-tuned BioNART on biomedical summarization and doctor-patient dialogue tasks. Experimental results show that our BioNART achieves about 94% of the ROUGE score to the pre-trained autoregressive model while realizing an 18 times faster inference speed on the iCliniq dataset.

pdf bib abs
Biomedical Relation Extraction with Entity Type Markers and Relation-specific Question Answering
Koshi Yamada | Makoto Miwa | Yutaka Sasaki
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Recently, several methods have tackled the relation extraction task with QA and have shown successful results. However, the effectiveness of existing methods in specific domains, such as the biomedical domain, is yet to be verified. When there are multiple entity pairs that share an entity in a sentence, a QA-based relation extraction model that outputs only one single answer to a given question may not extract desired relations. In addition, these methods employ QA models that are not tuned for relation extraction. To address these issues, we first extend and apply a span QA-based relation extraction method to the drug-protein relation extraction by creating question templates and incorporating entity type markers. We further propose a binary QA-based method that directly uses the entity information available in the relation extraction task. The experimental results on the DrugProt dataset show that our QA-based methods, especially the proposed binary QA method, are effective for drug-protein relation extraction.

pdf bib abs
Biomedical Document Classification with Literature Graph Representations of Bibliographies and Entities
Ryuki Ida | Makoto Miwa | Yutaka Sasaki
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

This paper proposes a new document classification method that incorporates the representations of a literature graph created from bibliographic and entity information. Recently, document classification performance has been significantly improved with large pre-trained language models; however, there still remain documents that are difficult to classify. External information, such as bibliographic information, citation links, descriptions of entities, and medical taxonomies, has been considered one of the keys to dealing with such documents in document classification. Although several document classification methods using external information have been proposed, they only consider limited relationships, e.g., word co-occurrence and citation relationships. However, there are multiple types of external information. To overcome the limitation of the conventional use of external information, we propose a document classification model that simultaneously considers bibliographic and entity information to deeply model the relationships among documents using the representations of the literature graph. The experimental results show that our proposed method outperforms existing methods on two document classification datasets in the biomedical domain with the help of the literature graph.

pdf bib abs
Span-based Named Entity Recognition by Generating and Compressing Information
Nhung T. H. Nguyen | Makoto Miwa | Sophia Ananiadou
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The information bottleneck (IB) principle has been proven effective in various NLP applications. The existing work, however, only used either generative or information compression models to improve the performance of the target task. In this paper, we propose to combine the two types of IB models into one system to enhance Named Entity Recognition (NER).For one type of IB model, we incorporate two unsupervised generative components, span reconstruction and synonym generation, into a span-based NER system. The span reconstruction ensures that the contextualised span representation keeps the span information, while the synonym generation makes synonyms have similar representations even in different contexts. For the other type of IB model, we add a supervised IB layer that performs information compression into the system to preserve useful features for NER in the resulting span representations. Experiments on five different corpora indicate that jointly training both generative and information compression models can enhance the performance of the baseline span-based NER system. Our source code is publicly available at https://github.com/nguyennth/joint-ib-models.

pdf bib abs
AIST AIRC Submissions to the WMT23 Shared Task
Matiss Rikters | Makoto Miwa
Proceedings of the Eighth Conference on Machine Translation

This paper describes the development process of NMT systems that were submitted to the WMT 2023 General Translation task by the team of AIST AIRC. We trained constrained track models for translation between English, German, and Japanese. Before training the final models, we first filtered the parallel and monolingual data, then performed iterative back-translation as well as parallel data distillation to be used for non-autoregressive model training. We experimented with training Transformer models, Mega models, and custom non-autoregressive sequence-to-sequence models with encoder and decoder weights initialised by a multilingual BERT base. Our primary submissions contain translations from ensembles of two Mega model checkpoints and our contrastive submissions are generated by our non-autoregressive models.

2022

pdf bib abs
Learning Disentangled Representations of Negation and Uncertainty
Jake Vasilakes | Chrysoula Zerva | Makoto Miwa | Sophia Ananiadou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Negation and uncertainty modeling are long-standing tasks in natural language processing. Linguistic theory postulates that expressions of negation and uncertainty are semantically independent from each other and the content they modify. However, previous works on representation learning do not explicitly model this independence. We therefore attempt to disentangle the representations of negation, uncertainty, and content using a Variational Autoencoder. We find that simply supervising the latent representations results in good disentanglement, but auxiliary objectives based on adversarial learning and mutual information minimization can provide additional disentanglement gains.

pdf bib abs
Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models
Naoki Iinuma | Makoto Miwa | Yutaka Sasaki
Proceedings of the 21st Workshop on Biomedical Language Processing

This paper proposes novel drug-protein relation extraction models that indirectly utilize distant supervision data. Concretely, instead of adding distant supervision data to the manually annotated training data, our models incorporate distantly supervised models that are relation extraction models trained with distant supervision data. Distantly supervised learning has been proposed to generate a large amount of pseudo-training data at low cost. However, there is still a problem of low prediction performance due to the inclusion of mislabeled data. Therefore, several methods have been proposed to suppress the effects of noisy cases by utilizing some manually annotated training data. However, their performance is lower than that of supervised learning on manually annotated data because mislabeled data that cannot be fully suppressed becomes noise when training the model. To overcome this issue, our methods indirectly utilize distant supervision data with manually annotated training data. The experimental results on the DrugProt corpus in the BioCreative VII Track 1 showed that our proposed model can consistently improve the supervised models in different settings.

pdf bib abs
Named Entity Recognition for Cancer Immunology Research Using Distant Supervision
Hai-Long Trieu | Makoto Miwa | Sophia Ananiadou
Proceedings of the 21st Workshop on Biomedical Language Processing

Cancer immunology research involves several important cell and protein factors. Extracting the information of such cells and proteins and the interactions between them from text are crucial in text mining for cancer immunology research. However, there are few available datasets for these entities, and the amount of annotated documents is not sufficient compared with other major named entity types. In this work, we introduce our automatically annotated dataset of key named entities, i.e., T-cells, cytokines, and transcription factors, which engages the recent cancer immunotherapy. The entities are annotated based on the UniProtKB knowledge base using dictionary matching. We build a neural named entity recognition (NER) model to be trained on this dataset and evaluate it on a manually-annotated data. Experimental results show that we can achieve a promising NER performance even though our data is automatically annotated. Our dataset also enhances the NER performance when combined with existing data, especially gaining improvement in yet investigated named entities such as cytokines and transcription factors.

2021

pdf bib
A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction
Kohei Makino | Makoto Miwa | Yutaka Sasaki
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors
Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose a multi-task, probabilistic approach to facilitate distantly supervised relation extraction by bringing closer the representations of sentences that contain the same Knowledge Base pairs. To achieve this, we bias the latent space of sentences via a Variational Autoencoder (VAE) that is trained jointly with a relation classifier. The latent code guides the pair representations and influences sentence reconstruction. Experimental results on two datasets created via distant supervision indicate that multi-task learning results in performance benefits. Additional exploration of employing Knowledge Base priors into theVAE reveals that the sentence space can be shifted towards that of the Knowledge Base, offering interpretability and further improving results.

2020

pdf bib abs
BENNERD: A Neural Named Entity Linking System for COVID-19
Mohammad Golam Sohrab | Khoa Duong | Makoto Miwa | Goran Topić | Ikeda Masami | Takamura Hiroya
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present a biomedical entity linking (EL) system BENNERD that detects named enti- ties in text and links them to the unified medical language system (UMLS) knowledge base (KB) entries to facilitate the corona virus disease 2019 (COVID-19) research. BEN- NERD mainly covers biomedical domain, es- pecially new entity types (e.g., coronavirus, vi- ral proteins, immune responses) by address- ing CORD-NER dataset. It includes several NLP tools to process biomedical texts includ- ing tokenization, flat and nested entity recog- nition, and candidate generation and rank- ing for EL that have been pre-trained using the CORD-NER corpus. To the best of our knowledge, this is the first attempt that ad- dresses NER and EL on COVID-19-related entities, such as COVID-19 virus, potential vaccines, and spreading mechanism, that may benefit research on COVID-19. We release an online system to enable real-time entity annotation with linking for end users. We also release the manually annotated test set and CORD-NERD dataset for leveraging EL task. The BENNERD system is available at https://aistairc.github.io/BENNERD/.

pdf bib abs
Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature
Fusataka Kuniyoshi | Kohei Makino | Jun Ozawa | Makoto Miwa
Proceedings of the Twelfth Language Resources and Evaluation Conference

The synthesis process is essential for achieving computational experiment design in the field of inorganic materials chemistry. In this work, we present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system for extracting the synthesis processes buried in the scientific literature. We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers. The automated machine-reading system is developed by a deep learning-based sequence tagger and simple heuristic rule-based relation extractor. Our experimental results demonstrate that the sequence tagger with the optimal setting can detect the entities with a macro-averaged F1 score of 0.826, while the rule-based relation extractor can achieve high performance with a macro-averaged F1 score of 0.887.

pdf bib abs
Ontology-Style Relation Annotation: A Case Study
Savong Bou | Naoki Suzuki | Makoto Miwa | Yutaka Sasaki
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper proposes an Ontology-Style Relation (OSR) annotation approach. In conventional Relation Extraction (RE) datasets, relations are annotated as links between entity mentions. In contrast, in our OSR annotation, a relation is annotated as a relation mention (i.e., not a link but a node) and domain and range links are annotated from the relation mention to its argument entity mentions. We expect the following benefits: (1) the relation annotations can be easily converted to Resource Description Framework (RDF) triples to populate an Ontology, (2) some part of conventional RE tasks can be tackled as Named Entity Recognition (NER) tasks. The relation classes are limited to several RDF properties such as domain, range, and subClassOf, and (3) OSR annotations can be clear documentations of Ontology contents. As a case study, we converted an in-house corpus of Japanese traffic rules in conventional annotations into the OSR annotations and built a novel OSR-RoR (Rules of the Road) corpus. The inter-annotator agreements of the conversion were 85-87%. We evaluated the performance of neural NER and RE tools on the conventional and OSR annotations. The experimental results showed that the OSR annotations make the RE task easier while introducing slight complexity into the NER task.

pdf bib abs
mgsohrab at WNUT 2020 Shared Task-1: Neural Exhaustive Approach for Entity and Relation Recognition Over Wet Lab Protocols
Mohammad Golam Sohrab | Anh-Khoa Duong Nguyen | Makoto Miwa | Hiroya Takamura
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

We present a neural exhaustive approach that addresses named entity recognition (NER) and relation recognition (RE), for the entity and re- lation recognition over the wet-lab protocols shared task. We introduce BERT-based neural exhaustive approach that enumerates all pos- sible spans as potential entity mentions and classifies them into entity types or no entity with deep neural networks to address NER. To solve relation extraction task, based on the NER predictions or given gold mentions we create all possible trigger-argument pairs and classify them into relation types or no relation. In NER task, we achieved 76.60% in terms of F-score as third rank system among the partic- ipated systems. In relation extraction task, we achieved 80.46% in terms of F-score as the top system in the relation extraction or recognition task. Besides we compare our model based on the wet lab protocols corpus (WLPC) with the WLPC baseline and dynamic graph-based in- formation extraction (DyGIE) systems.

2019

pdf bib abs
A Search-based Neural Model for Biomedical Nested and Overlapping Event Detection
Kurt Junshean Espinosa | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We tackle the nested and overlapping event detection task and propose a novel search-based neural network (SBNN) structured prediction model that treats the task as a search problem on a relation graph of trigger-argument structures. Unlike existing structured prediction tasks such as dependency parsing, the task targets to detect DAG structures, which constitute events, from the relation graph. We define actions to construct events and use all the beams in a beam search to detect all event structures that may be overlapping and nested. The search process constructs events in a bottom-up manner while modelling the global properties for nested and overlapping structures simultaneously using neural networks. We show that the model achieves performance comparable to the state-of-the-art model Turku Event Extraction System (TEES) on the BioNLP Cancer Genetics (CG) Shared Task 2013 without the use of any syntactic and hand-engineered features. Further analyses on the development set show that our model is more computationally efficient while yielding higher F1-score performance.

pdf bib abs
Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs
Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Document-level relation extraction is a complex human process that requires logical inference to extract relationships between named entities in text. Existing approaches use graph-based neural models with words as nodes and edges as relations between them, to encode relations across sentences. These models are node-based, i.e., they form pair representations based solely on the two target node representations. However, entity relations can be better expressed through unique edge representations formed as paths between nodes. We thus propose an edge-oriented graph neural model for document-level relation extraction. The model utilises different types of nodes and edges to create a document-level graph. An inference mechanism on the graph edges enables to learn intra- and inter-sentence relations using multi-instance learning internally. Experiments on two document-level biomedical datasets for chemical-disease and gene-disease associations show the usefulness of the proposed edge-oriented approach.

pdf bib abs
A Neural Pipeline Approach for the PharmaCoNER Shared Task using Contextual Exhaustive Models
Mohammad Golam Sohrab | Minh Thang Pham | Makoto Miwa | Hiroya Takamura
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

We present a neural pipeline approach that performs named entity recognition (NER) and concept indexing (CI), which links them to concept unique identifiers (CUIs) in a knowledge base, for the PharmaCoNER shared task on pharmaceutical drugs and chemical entities. We proposed a neural NER model that captures the surrounding semantic information of a given sequence by capturing the forward- and backward-context of bidirectional LSTM (Bi-LSTM) output of a target span using contextual span representation-based exhaustive approach. The NER model enumerates all possible spans as potential entity mentions and classify them into entity types or no entity with deep neural networks. For representing span, we compare several different neural network architectures and their ensembling for the NER model. We then perform dictionary matching for CI and, if there is no matching, we further compute similarity scores between a mention and CUIs using entity embeddings to assign the CUI with the highest score to the mention. We evaluate our approach on the two sub-tasks in the shared task. Among the five submitted runs, the best run for each sub-task achieved the F-score of 86.76% on Sub-task 1 (NER) and the F-score of 79.97% (strict) on Sub-task 2 (CI).

pdf bib abs
Coreference Resolution in Full Text Articles with BERT and Syntax-based Mention Filtering
Hai-Long Trieu | Anh-Khoa Duong Nguyen | Nhung Nguyen | Makoto Miwa | Hiroya Takamura | Sophia Ananiadou
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

This paper describes our system developed for the coreference resolution task of the CRAFT Shared Tasks 2019. The CRAFT corpus is more challenging than other existing corpora because it contains full text articles. We have employed an existing span-based state-of-theart neural coreference resolution system as a baseline system. We enhance the system with two different techniques to capture longdistance coreferent pairs. Firstly, we filter noisy mentions based on parse trees with increasing the number of antecedent candidates. Secondly, instead of relying on the LSTMs, we integrate the highly expressive language model–BERT into our model. Experimental results show that our proposed systems significantly outperform the baseline. The best performing system obtained F-scores of 44%, 48%, 39%, 49%, 40%, and 57% on the test set with B3, BLANC, CEAFE, CEAFM, LEA, and MUC metrics, respectively. Additionally, the proposed model is able to detect coreferent pairs in long distances, even with a distance of more than 200 sentences.

pdf bib abs
Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network
Sunil Kumar Sahu | Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural network model on a document-level graph. The graph is constructed using various inter- and intra-sentence dependencies to capture local and non-local dependency information. In order to predict the relation of an entity pair, we utilise multi-instance learning with bi-affine pairwise scoring. Experimental results show that our model achieves comparable performance to the state-of-the-art neural models on two biochemistry datasets. Our analysis shows that all the types in the graph are effective for inter-sentence relation extraction.

2018

pdf bib abs
Deep Exhaustive Model for Nested Named Entity Recognition
Mohammad Golam Sohrab | Makoto Miwa
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a simple deep neural model for nested named entity recognition (NER). Most NER models focused on flat entities and ignored nested entities, which failed to fully capture underlying semantic information in texts. The key idea of our model is to enumerate all possible regions or spans as potential entity mentions and classify them with deep neural networks. To reduce the computational costs and capture the information of the contexts around the regions, the model represents the regions using the outputs of shared underlying bidirectional long short-term memory. We evaluate our exhaustive model on the GENIA and JNLPBA corpora in biomedical domain, and the results show that our model outperforms state-of-the-art models on nested and flat NER, achieving 77.1% and 78.4% respectively in terms of F-score, without any external knowledge resources.

pdf bib abs
A Neural Layered Model for Nested Named Entity Recognition
Meizhi Ju | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Entity mentions embedded in longer entity mentions are referred to as nested entities. Most named entity recognition (NER) systems deal only with the flat entities and ignore the inner nested ones, which fails to capture finer-grained semantic information in underlying texts. To address this issue, we propose a novel neural model to identify nested entities by dynamically stacking flat NER layers. Each flat NER layer is based on the state-of-the-art flat NER model that captures sequential context representation with bidirectional Long Short-Term Memory (LSTM) layer and feeds it to the cascaded CRF layer. Our model merges the output of the LSTM layer in the current flat NER layer to build new representation for detected entities and subsequently feeds them into the next flat NER layer. This allows our model to extract outer entities by taking full advantage of information encoded in their corresponding inner entities, in an inside-to-outside way. Our model dynamically stacks the flat NER layers until no outer entities are extracted. Extensive evaluation shows that our dynamic model outperforms state-of-the-art feature-based systems on nested NER, achieving 74.7% and 72.2% on GENIA and ACE2005 datasets, respectively, in terms of F-score.

pdf bib abs
A Walk-based Model on Entity Graphs for Relation Extraction
Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between two entities, we construct up to l-length walks between each pair. The resulting walks are merged and iteratively used to update the edge representations into longer walks representations. We show that the model achieves performance comparable to the state-of-the-art systems on the ACE 2005 dataset without using any external tools.

pdf bib abs
Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information
Masaki Asada | Makoto Miwa | Yutaka Sasaki
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information. We encode textual drug pairs with convolutional neural networks and their molecular pairs with graph convolutional networks (GCNs), and then we concatenate the outputs of these two networks. In the experiments, we show that GCNs can predict DDIs from the molecular structures of drugs in high accuracy and the molecular information can enhance text-based DDI extraction by 2.39 percent points in the F-score on the DDIExtraction 2013 shared task data set.

pdf bib abs
Investigating Domain-Specific Information for Neural Coreference Resolution on Biomedical Texts
Hai-Long Trieu | Nhung T. H. Nguyen | Makoto Miwa | Sophia Ananiadou
Proceedings of the BioNLP 2018 workshop

Existing biomedical coreference resolution systems depend on features and/or rules based on syntactic parsers. In this paper, we investigate the utility of the state-of-the-art general domain neural coreference resolution system on biomedical texts. The system is an end-to-end system without depending on any syntactic parsers. We also investigate the domain specific features to enhance the system for biomedical texts. Experimental results on the BioNLP Protein Coreference dataset and the CRAFT corpus show that, with no parser information, the adapted system compared favorably with the systems that depend on parser information on these datasets, achieving 51.23% on the BioNLP dataset and 36.33% on the CRAFT corpus in F1 score. In-domain embeddings and domain-specific features helped improve the performance on the BioNLP dataset, but they did not on the CRAFT corpus.

2017

pdf bib abs
Bib2vec: Embedding-based Search System for Bibliographic Information
Takuma Yoneda | Koki Mori | Makoto Miwa | Yutaka Sasaki
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We propose a novel embedding model that represents relationships among several elements in bibliographic information with high representation ability and flexibility. Based on this model, we present a novel search system that shows the relationships among the elements in the ACL Anthology Reference Corpus. The evaluation results show that our model can achieve a high prediction ability and produce reasonable search results.

pdf bib abs
Analyzing Well-Formedness of Syllables in Japanese Sign Language
Satoshi Yawata | Makoto Miwa | Yutaka Sasaki | Daisuke Hara
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper tackles a problem of analyzing the well-formedness of syllables in Japanese Sign Language (JSL). We formulate the problem as a classification problem that classifies syllables into well-formed or ill-formed. We build a data set that contains hand-coded syllables and their well-formedness. We define a fine-grained feature set based on the hand-coded syllables and train a logistic regression classifier on labeled syllables, expecting to find the discriminative features from the trained classifier. We also perform pseudo active learning to investigate the applicability of active learning in analyzing syllables. In the experiments, the best classifier with our combinatorial features achieved the accuracy of 87.0%. The pseudo active learning is also shown to be effective showing that it could reduce about 84% of training instances to achieve the accuracy of 82.0% when compared to the model without active learning.

pdf bib abs
Utilizing Visual Forms of Japanese Characters for Neural Review Classification
Yota Toyama | Makoto Miwa | Yutaka Sasaki
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We propose a novel method that exploits visual information of ideograms and logograms in analyzing Japanese review documents. Our method first converts font images of Japanese characters into character embeddings using convolutional neural networks. It then constructs document embeddings from the character embeddings based on Hierarchical Attention Networks, which represent the documents based on attention mechanisms from a character level to a sentence level. The document embeddings are finally used to predict the labels of documents. Our method provides a way to exploit visual features of characters in languages with ideograms and logograms. In the experiments, our method achieved an accuracy comparable to a character embedding-based model while our method has much fewer parameters since it does not need to keep embeddings of thousands of characters.

pdf bib abs
TTI-COIN at SemEval-2017 Task 10: Investigating Embeddings for End-to-End Relation Extraction from Scientific Papers
Tomoki Tsujimura | Makoto Miwa | Yutaka Sasaki
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our TTI-COIN system that participated in SemEval-2017 Task 10. We investigated appropriate embeddings to adapt a neural end-to-end entity and relation extraction system LSTM-ER to this task. We participated in the full task setting of the entity segmentation, entity classification and relation classification (scenario 1) and the setting of relation classification only (scenario 3). The system was directly applied to the scenario 1 without modifying the codes thanks to its generality and flexibility. Our evaluation results show that the choice of appropriate pre-trained embeddings affected the performance significantly. With the best embeddings, our system was ranked third in the scenario 1 with the micro F1 score of 0.38. We also confirm that our system can produce the micro F1 score of 0.48 for the scenario 3 on the test data, and this score is close to the score of the 3rd ranked system in the task.

pdf bib abs
Extracting Drug-Drug Interactions with Attention CNNs
Masaki Asada | Makoto Miwa | Yutaka Sasaki
Proceedings of the 16th BioNLP Workshop

We propose a novel attention mechanism for a Convolutional Neural Network (CNN)-based Drug-Drug Interaction (DDI) extraction model. CNNs have been shown to have a great potential on DDI extraction tasks; however, attention mechanisms, which emphasize important words in the sentence of a target-entity pair, have not been investigated with the CNNs despite the fact that attention mechanisms are shown to be effective for a general domain relation classification task. We evaluated our model on the Task 9.2 of the DDIExtraction-2013 shared task. As a result, our attention mechanism improved the performance of our base CNN-based DDI model, and the model achieved an F-score of 69.12%, which is competitive with the state-of-the-art models.

2016

pdf bib abs
Distributional Hypernym Generation by Jointly Learning Clusters and Projections
Josuke Yamane | Tomoya Takatani | Hitoshi Yamada | Makoto Miwa | Yutaka Sasaki
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a novel word embedding-based hypernym generation model that jointly learns clusters of hyponym-hypernym relations, i.e., hypernymy, and projections from hyponym to hypernym embeddings. Most of the recent hypernym detection models focus on a hypernymy classification problem that determines whether a pair of words is in hypernymy or not. These models do not directly deal with a hypernym generation problem in that a model generates hypernyms for a given word. Differently from previous studies, our model jointly learns the clusters and projections with adjusting the number of clusters so that the number of clusters can be determined depending on the learned projections and vice versa. Our model also boosts the performance by incorporating inner product-based similarity measures and negative examples, i.e., sampled non-hypernyms, into our objectives in learning. We evaluated our joint learning models on the task of Japanese and English hypernym generation and showed a significant improvement over an existing pipeline model. Our model also compared favorably to existing distributed hypernym detection models on the English hypernym classification task.

pdf bib abs
Ensemble Classification of Grants using LDA-based Features
Yannis Korkontzelos | Beverley Thomas | Makoto Miwa | Sophia Ananiadou
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Classifying research grants into useful categories is a vital task for a funding body to give structure to the portfolio for analysis, informing strategic planning and decision-making. Automating this classification process would save time and effort, providing the accuracy of the classifications is maintained. We employ five classification models to classify a set of BBSRC-funded research grants in 21 research topics based on unigrams, technical terms and Latent Dirichlet Allocation models. To boost precision, we investigate methods for combining their predictions into five aggregate classifiers. Evaluation confirmed that ensemble classification models lead to higher precision. It was observed that there is not a single best-performing aggregate method for all research topics. Instead, the best-performing method for a research topic depends on the number of positive training instances available for this topic. Subject matter experts considered the predictions of aggregate models to correct erroneous or incomplete manual assignments.

pdf bib
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Makoto Miwa | Mohit Bansal
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Makoto Miwa

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2011

2010

2009

Co-authors

Venues