ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision
Xuan Wang, Vivian Hu, Xiangchen Song, Shweta Garg, Jinfeng Xiao, Jiawei Han
Abstract
Scientific literature analysis needs fine-grained named entity recognition (NER) to provide a wide range of information for scientific discovery. For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts. On the other hand, domain-specific ontologies and knowledge bases (KBs) can be easily accessed, constructed, or integrated, which makes distant supervision realistic for fine-grained chemistry NER. In distant supervision, training labels are generated by matching mentions in a document with the concepts in the knowledge bases (KBs). However, this kind of KB-matching suffers from two major challenges: incomplete annotation and noisy annotation. We propose ChemNER, an ontology-guided, distantly-supervised method for fine-grained chemistry NER to tackle these challenges. It leverages the chemistry type ontology structure to generate distant labels with novel methods of flexible KB-matching and ontology-guided multi-type disambiguation. It significantly improves the distant label generation for the subsequent sequence labeling model training. We also provide an expert-labeled, chemistry NER dataset with 62 fine-grained chemistry types (e.g., chemical compounds and chemical reactions). Experimental results show that ChemNER is highly effective, outperforming substantially the state-of-the-art NER methods (with .25 absolute F1 score improvement).- Anthology ID:
- 2021.emnlp-main.424
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5227–5240
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.424
- DOI:
- 10.18653/v1/2021.emnlp-main.424
- Cite (ACL):
- Xuan Wang, Vivian Hu, Xiangchen Song, Shweta Garg, Jinfeng Xiao, and Jiawei Han. 2021. ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5227–5240, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision (Wang et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.emnlp-main.424.pdf
- Code
- xuanwang91/chemner