@inproceedings{hsieh-2014-chinese,
    title = "Why {C}hinese Web-as-Corpus is Wacky? Or: How Big Data is Killing {C}hinese Corpus Linguistics",
    author = "Hsieh, Shu-Kai",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Declerck, Thierry  and
      Loftsson, Hrafn  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)",
    month = may,
    year = "2014",
    address = "Reykjavik, Iceland",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/ingest-emnlp/L14-1649/",
    pages = "2386--2389",
    abstract = "This paper aims to examine and evaluate the current development of using Web-as-Corpus (WaC) paradigm in Chinese corpus linguistics. I will argue that the unstable notion of wordhood in Chinese and the resulting diverse ideas of implementing word segmentation systems have posed great challenges for those who are keen on building web-scaled corpus data. Two lexical measures are proposed to illustrate the issues and methodological discussions are provided."
}Markdown (Informal)
[Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics](https://preview.aclanthology.org/ingest-emnlp/L14-1649/) (Hsieh, LREC 2014)
ACL