Tree Distance and Some Other Variants of Evalb

Martin Emms


Abstract
Some alternatives to the standard evalb measures for parser evaluation are considered, principally the use of a tree-distance measure, which assigns a score to a linearity and ancestry respecting mapping between trees, in contrast to the evalb measures, which assign a score to a span preserving mapping. Additionally, analysis of the evalb measures suggests some further variants, concerning different normalisations, the portions of a tree compared and whether scores should be micro or macro averaged. The outputs of 6 parsing systems on Section 23 of the Penn Treebank were taken. It is shown that the ranking of the parsing systems varies as the alternative evaluation measures are used. For a fixed parsing system, it is also shown that the ranking of the parses from best to worst will vary according to whether the evalb or tree-distance measure is used. It is argued that the tree-distance measure ameliorates a problem that has been noted concerning over-penalisation of attachment errors.
Anthology ID:
L08-1383
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/348_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Martin Emms. 2008. Tree Distance and Some Other Variants of Evalb. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Tree Distance and Some Other Variants of Evalb (Emms, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/348_paper.pdf