Comparative Analysis of Portuguese Named Entities Recognition Tools

Daniela Amaral, Evandro Fonseca, Lucelene Lopes, Renata Vieira


Abstract
This paper describes an experiment to compare four tools to recognize named entities in Portuguese texts. The experiment was made over the HAREM corpora, a golden standard for named entities recognition in Portuguese. The tools experimented are based on natural language processing techniques and also machine learning. Specifically, one of the tools is based on Conditional random fields, an unsupervised machine learning model that has being used to named entities recognition in several languages, while the other tools follow more traditional natural language approaches. The comparison results indicate advantages for different tools according to the different classes of named entities. Despite of such balance among tools, we conclude pointing out foreseeable advantages to the machine learning based tool.
Anthology ID:
L14-1425
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2554–2558
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/513_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Daniela Amaral, Evandro Fonseca, Lucelene Lopes, and Renata Vieira. 2014. Comparative Analysis of Portuguese Named Entities Recognition Tools. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2554–2558, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Comparative Analysis of Portuguese Named Entities Recognition Tools (Amaral et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/513_Paper.pdf