This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
YukaTakei
Also published as:
Yuya Takei
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Twitter is used for various applications such as disaster monitoring and news material gathering. In these applications, each Tweet is classified into pre-defined classes. These classes have a semantic relationship with each other and can be classified into a hierarchical structure, which is regarded as important information. Label texts of pre-defined classes themselves also include important clues for classification. Therefore, we propose a method that can consider the hierarchical structure of labels and label texts themselves. We conducted evaluation over the Text REtrieval Conference (TREC) 2018 Incident Streams (IS) track dataset, and we found that our method outperformed the methods of the conference participants.
We developed a system that automatically extracts “Event-describing Tweets” which include incidents or accidents information for creating news reports. Event-describing Tweets can be classified into “Reported-event Tweets” and “New-information Tweets.” Reported-event Tweets cite news agencies or user generated content sites, and New-information Tweets are other Event-describing Tweets. A system is needed to classify them so that creators of factual TV programs can use them in their productions. Proposing this Tweet classification task is one of the contributions of this paper, because no prior papers have used the same task even though program creators and other events information collectors have to do it to extract required information from social networking sites. To classify Tweets in this task, this paper proposes a method to input and concatenate character and word sequences in Japanese Tweets by using convolutional neural networks. This proposed method is another contribution of this paper. For comparison, character or word input methods and other neural networks are also used. Results show that a system using the proposed method and architectures can classify Tweets with an F1 score of 88 %.
The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.