Charmaine Salvador Ponay
2026
PHMartialLawNER: A Tagalog Named Entity Recognition Corpus for the Philippine Martial Law Era
Abdiel Clarence Tabuzo | Vladimir Gray Velazco | Cassandra Cabral | Moneah Shaila Lacsam | Charmaine Salvador Ponay
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Abdiel Clarence Tabuzo | Vladimir Gray Velazco | Cassandra Cabral | Moneah Shaila Lacsam | Charmaine Salvador Ponay
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Historical corpora for Tagalog remain limited, particularly texts produced during the Martial Law period under the dictatorship of Ferdinand Marcos Sr. (1972–1986). Much of this material remains undigitized, restricting computational analysis of a significant period in Philippine political history. To support research on historical Tagalog texts, we introduce PHMartialLawNER, a gold-standard named entity recognition corpus constructed from newspapers and underground publications of the Martial Law era. The corpus includes approximately 13k extracted sentence segments (362,000 tokens), consolidated into 8k annotated text spans through a semi-automatic pipeline with manual validation. The reliability of the annotation is measured using Cohen’s 𝜅, reaching 0.86 on all tokens and 0.72 on annotated tokens, with a pairwise F1-score of 0.74. The schema defines historically relevant entity categories including Person (Individual, Collective), Organization (Political, Government, Other), Event (Local, International), Production (Media, Government, Doctrine), as well as Time, Numerical Statistics, Location, and Object entities, specifically identifying weapon artifacts. We establish baseline performance using GLiNER variants, calamanCy models, and transformer-based architectures under zero-shot and few-shot settings. The PHMartialLawNER corpus will be publicly released to support Tagalog NLP, historical text processing, and digital humanities research.