PHMartialLawNER: A Tagalog Named Entity Recognition Corpus for the Philippine Martial Law Era
Abdiel Clarence Tabuzo, Vladimir Gray Velazco, Cassandra Cabral, Moneah Shaila Lacsam, Charmaine Salvador Ponay
Abstract
Historical corpora for Tagalog remain limited, particularly texts produced during the Martial Law period under the dictatorship of Ferdinand Marcos Sr. (1972–1986). Much of this material remains undigitized, restricting computational analysis of a significant period in Philippine political history. To support research on historical Tagalog texts, we introduce PHMartialLawNER, a gold-standard named entity recognition corpus constructed from newspapers and underground publications of the Martial Law era. The corpus includes approximately 13k extracted sentence segments (362,000 tokens), consolidated into 8k annotated text spans through a semi-automatic pipeline with manual validation. The reliability of the annotation is measured using Cohen’s 𝜅, reaching 0.86 on all tokens and 0.72 on annotated tokens, with a pairwise F1-score of 0.74. The schema defines historically relevant entity categories including Person (Individual, Collective), Organization (Political, Government, Other), Event (Local, International), Production (Media, Government, Doctrine), as well as Time, Numerical Statistics, Location, and Object entities, specifically identifying weapon artifacts. We establish baseline performance using GLiNER variants, calamanCy models, and transformer-based architectures under zero-shot and few-shot settings. The PHMartialLawNER corpus will be publicly released to support Tagalog NLP, historical text processing, and digital humanities research.- Anthology ID:
- 2026.nlp4dh-1.16
- Volume:
- Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, USA
- Editors:
- Sil Hamilton, Emily Öhman, Rebecca M. M. Hicke, Yuri Bizzoni, Axel Bax, Jacob A. Matthews, Mika Hämäläinen
- Venues:
- NLP4DH | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 167–177
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.16/
- DOI:
- Cite (ACL):
- Abdiel Clarence Tabuzo, Vladimir Gray Velazco, Cassandra Cabral, Moneah Shaila Lacsam, and Charmaine Salvador Ponay. 2026. PHMartialLawNER: A Tagalog Named Entity Recognition Corpus for the Philippine Martial Law Era. In Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities, pages 167–177, San Diego, USA. Association for Computational Linguistics.
- Cite (Informal):
- PHMartialLawNER: A Tagalog Named Entity Recognition Corpus for the Philippine Martial Law Era (Tabuzo et al., NLP4DH 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.16.pdf