@inproceedings{zhao-etal-2010-large,
    title = "How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method",
    author = "Zhao, Hai  and
      Song, Yan  and
      Kit, Chunyu",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Odijk, Jan  and
      Piperidis, Stelios  and
      Rosner, Mike  and
      Tapias, Daniel",
    booktitle = "Proceedings of the Seventh International Conference on Language Resources and Evaluation ({LREC}'10)",
    month = may,
    year = "2010",
    address = "Valletta, Malta",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/iwcs-25-ingestion/L10-1134/",
    abstract = "We investigate the impact of input data scale in corpus-based learning using a study style of Zipfs law. In our research, Chinese word segmentation is chosen as the study case and a series of experiments are specially conducted for it, in which two types of segmentation techniques, statistical learning and rule-based methods, are examined. The empirical results show that a linear performance improvement in statistical learning requires an exponential increasing of training corpus size at least. As for the rule-based method, an approximate negative inverse relationship between the performance and the size of the input lexicon can be observed."
}Markdown (Informal)
[How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method](https://preview.aclanthology.org/iwcs-25-ingestion/L10-1134/) (Zhao et al., LREC 2010)
ACL