Abstract
This paper describes the named entity language resources developed as part of a development project for the South African languages. The development efforts focused on creating protocols and annotated data sets with at least 15,000 annotated named entity tokens for ten of the official South African languages. The description of the protocols and annotated data sets provide an overview of the problems encountered during the annotation of the data sets. Based on these annotated data sets, CRF named entity recognition systems are developed that leverage existing linguistic resources. The newly created named entity recognisers are evaluated, with F-scores of between 0.64 and 0.77, and error analysis is performed to identify possible avenues for improving the quality of the systems.- Anthology ID:
- L16-1533
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3344–3348
- Language:
- URL:
- https://aclanthology.org/L16-1533
- DOI:
- Cite (ACL):
- Roald Eiselen. 2016. Government Domain Named Entity Recognition for South African Languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3344–3348, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Government Domain Named Entity Recognition for South African Languages (Eiselen, LREC 2016)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/L16-1533.pdf