Akihiro Ohtani


2016

pdf
Big Community Data before World Wide Web Era
Tomoya Iwakura | Tetsuro Takahashi | Akihiro Ohtani | Kunio Matsui
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper introduces the NIFTY-Serve corpus, a large data archive collected from Japanese discussion forums that operated via a Bulletin Board System (BBS) between 1987 and 2006. This corpus can be used in Artificial Intelligence researches such as Natural Language Processing, Community Analysis, and so on. The NIFTY-Serve corpus differs from data on WWW in three ways; (1) essentially spam- and duplication-free because of strict data collection procedures, (2) historic user-generated data before WWW, and (3) a complete data set because the service now shut down. We also introduce some examples of use of the corpus.