Abstract
The world-wide proliferation of digital communications has created the need for language and speech processing systems for under-resourced languages. Developing such systems is challenging if only small data sets are available, and the problem is exacerbated for languages with highly productive morphology. However, many under-resourced languages are spoken in multi-lingual environments together with at least one resource-rich language and thus have numerous borrowings from resource-rich languages. Based on this insight, we argue that readily available resources from resource-rich languages can be used to bootstrap the morphological analyses of under-resourced languages with complex and productive morphological systems. In a case study of two such languages, Tagalog and Zulu, we show that an easily obtainable English wordlist can be deployed to seed a morphological analysis algorithm from a small training set of conversational transcripts. Our method achieves a precision of 100% and identifies 28 and 66 of the most productive affixes in Tagalog and Zulu, respectively.- Anthology ID:
- L14-1035
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3355–3359
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/1051_Paper.pdf
- DOI:
- Cite (ACL):
- Peter Baumann and Janet Pierrehumbert. 2014. Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3355–3359, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages (Baumann & Pierrehumbert, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/1051_Paper.pdf