Abstract
The creation of high-quality named entity annotated resources is time-consuming and an expensive process. Most of the gold standard corpora are available for English but not for less-resourced languages such as Vietnamese. In Asian languages, this task is remained problematic. This paper focuses on an automatic construction of named entity annotated corpora for Vietnamese-French, a less-resourced pair of languages. We incrementally apply different cross-projection methods using parallel corpora, such as perfect string matching and edit distance similarity. Evaluations on Vietnamese –French pair of languages show a good accuracy (F-score of 94.90%) when identifying named entities pairs and building a named entity annotated parallel corpus.- Anthology ID:
- 2015.jeptalnrecital-demonstration.6
- Volume:
- Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations
- Month:
- June
- Year:
- 2015
- Address:
- Caen, France
- Venue:
- JEP/TALN/RECITAL
- SIG:
- Publisher:
- ATALA
- Note:
- Pages:
- 12–13
- Language:
- URL:
- https://aclanthology.org/2015.jeptalnrecital-demonstration.6
- DOI:
- Cite (ACL):
- Ngoc Tan Le and Fatiha Sadat. 2015. Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection. In Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations, pages 12–13, Caen, France. ATALA.
- Cite (Informal):
- Building a Bilingual Vietnamese-French Named Entity Annotated Corpus through Cross-Linguistic Projection (Le & Sadat, JEP/TALN/RECITAL 2015)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2015.jeptalnrecital-demonstration.6.pdf