@inproceedings{murawaki-mori-2016-wikification,
    title = "Wikification for Scriptio Continua",
    author = "Murawaki, Yugo  and
      Mori, Shinsuke",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Grobelnik, Marko  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, Helene  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)",
    month = may,
    year = "2016",
    address = "Portoro{\v{z}}, Slovenia",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/landing_page/L16-1214/",
    pages = "1346--1351",
    abstract = "The fact that Japanese employs scriptio continua, or a writing system without spaces, complicates the first step of an NLP pipeline. Word segmentation is widely used in Japanese language processing, and lexical knowledge is crucial for reliable identification of words in text. Although external lexical resources like Wikipedia are potentially useful, segmentation mismatch prevents them from being straightforwardly incorporated into the word segmentation task. If we intentionally violate segmentation standards with the direct incorporation, quantitative evaluation will be no longer feasible. To address this problem, we propose to define a separate task that directly links given texts to an external resource, that is, wikification in the case of Wikipedia. By doing so, we can circumvent segmentation mismatch that may not necessarily be important for downstream applications. As the first step to realize the idea, we design the task of Japanese wikification and construct wikification corpora. We annotated subsets of the Balanced Corpus of Contemporary Written Japanese plus Twitter short messages. We also implement a simple wikifier and investigate its performance on these corpora."
}Markdown (Informal)
[Wikification for Scriptio Continua](https://preview.aclanthology.org/landing_page/L16-1214/) (Murawaki & Mori, LREC 2016)
ACL
- Yugo Murawaki and Shinsuke Mori. 2016. Wikification for Scriptio Continua. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1346–1351, Portorož, Slovenia. European Language Resources Association (ELRA).