@inproceedings{baldwin-awab-2006-open,
    title = "Open Source Corpus Analysis Tools for {M}alay",
    author = "Baldwin, Timothy  and
      Awab, Su{'}ad",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Gangemi, Aldo  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Odijk, Jan  and
      Tapias, Daniel",
    booktitle = "Proceedings of the Fifth International Conference on Language Resources and Evaluation ({LREC}{'}06)",
    month = may,
    year = "2006",
    address = "Genoa, Italy",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/ingest-emnlp/L06-1421/",
    abstract = "Tokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of each over a 26K word sample of Malay text."
}Markdown (Informal)
[Open Source Corpus Analysis Tools for Malay](https://preview.aclanthology.org/ingest-emnlp/L06-1421/) (Baldwin & Awab, LREC 2006)
ACL
- Timothy Baldwin and Su’ad Awab. 2006. Open Source Corpus Analysis Tools for Malay. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).