Su’ad Awab


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2006

pdf bib
Open Source Corpus Analysis Tools for Malay
Timothy Baldwin | Su’ad Awab
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Tokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of each over a 26K word sample of Malay text.