Determining the Origin and Structure of Person Names

Yu Fu, Feiyu Xu, Hans Uszkoreit


Abstract
This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly strong evidence for predicting the language origin of person names. Our approach is context-, language- and domain-independent and can thus be easily adapted to person names in or from other languages. Furthermore, we provide a novel strategy to handle origin ambiguities or multiple origins in a name. HENNA also provides a person name parser for the analysis of linguistic and knowledge structures of person names. All the knowledge about a person name in HENNA is modelled in a person-name ontology, including relationships between language origins, linguistic features and grammars of person names of a specific language and interpretation of name elements. The approaches presented here are useful extensions of the named entity recognition task.
Anthology ID:
L10-1527
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/763_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Yu Fu, Feiyu Xu, and Hans Uszkoreit. 2010. Determining the Origin and Structure of Person Names. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Determining the Origin and Structure of Person Names (Fu et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/763_Paper.pdf