Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection

Ngoc Tan Le; Fatiha Sadat

Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection

Abstract

The creation of high-quality named entity annotated resources is time-consuming and an expensive process. Most of the gold standard corpora are available for English but not for less-resourced languages such as Vietnamese. In Asian languages, this task is remained problematic. This paper focuses on an automatic construction of named entity annotated corpora for Vietnamese-French, a less-resourced pair of languages. We incrementally apply different cross-projection methods using parallel corpora, such as perfect string matching and edit distance similarity. Evaluations on Vietnamese –French pair of languages show a good accuracy (F-score of 94.90%) when identifying named entities pairs and building a named entity annotated parallel corpus.

Anthology ID:: 2015.jeptalnrecital-demonstration.6
Volume:: Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations
Month:: June
Year:: 2015
Address:: Caen, France
Venue:: JEP/TALN/RECITAL
SIG:
Publisher:: ATALA
Note:
Pages:: 12–13
Language:
URL:: https://aclanthology.org/2015.jeptalnrecital-demonstration.6
DOI:
Bibkey:
Cite (ACL):: Ngoc Tan Le and Fatiha Sadat. 2015. Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection. In Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations, pages 12–13, Caen, France. ATALA.
Cite (Informal):: Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection (Le & Sadat, JEP/TALN/RECITAL 2015)
Copy Citation:
PDF:: https://preview.aclanthology.org/paclic-22-ingestion/2015.jeptalnrecital-demonstration.6.pdf

PDF Search