Dain Kaplan


2020

pdf
Conversational Semantic Parsing for Dialog State Tracking
Jianpeng Cheng | Devang Agrawal | Héctor Martínez Alonso | Shruti Bhargava | Joris Driesen | Federico Flego | Dain Kaplan | Dimitri Kartsaklis | Lin Li | Dhivya Piraviperumal | Jason D. Williams | Hong Yu | Diarmuid Ó Séaghdha | Anders Johannsen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to ~20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.

2016

pdf
Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Dain Kaplan | Neil Rubens | Simone Teufel | Takenobu Tokunaga
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and specific task(s) for using it make traditional AL inapplicable. In this paper we propose a novel method for model-free AL utilising characteristics of phenomena for applying AL to select documents for annotation. The method can also supplement traditional closed-loop AL-based CC to extend the utility of the corpus created beyond a single task. We introduce our tool, MOVE, and show its potential with a real world case-study.

2010

pdf
Annotation Process Management Revisited
Dain Kaplan | Ryu Iida | Takenobu Tokunaga
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Proper annotation process management is crucial to the construction of corpora, which are in turn indispensable to the data-driven techniques that have come to the forefront in NLP during the last two decades. It is still common to see ad-hoc tools created for a specific annotation project, but it is time this changed; creation of such tools is labor and time expensive, and is secondary to corpus creation. In addition, such tools likely lack proper annotation process management, increasingly more important as corpora sizes grow in size and complexity. This paper first raises a list of ten needs that any general purpose annotation system should address moving forward, such as user & role management, delegation & monitoring of work, diffing & merging annotators’ work, versioning of corpora, multilingual support, import/export format flexibility, and so on. A framework to address these needs is then proposed, and how having proper annotation process management can be beneficial to the creation and maintenance of corpora explained. The paper then introduces SLATE (Segment and Link-based Annotation Tool Enhanced), the second iteration of a web-based annotation tool, which is being rewritten to implement the proposed framework.

2009

pdf
Query Expansion using LMF-Compliant Lexical Resources
Takenobu Tokunaga | Dain Kaplan | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Virach Sornlertlamvanich | Thatsanee Charoenporn | Yingju Xia | Chu-Ren Huang | Shu-Kai Hsieh | Kiyoaki Shirai
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf
Automatic Extraction of Citation Contexts for Research Paper Summarization: A Coreference-chain based Approach
Dain Kaplan | Ryu Iida | Takenobu Tokunaga
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

2008

pdf
Adapting International Standard for Asian Language Technologies
Takenobu Tokunaga | Dain Kaplan | Chu-Ren Huang | Shu-Kai Hsieh | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Kiyoaki Shirai | Virach Sornlertlamvanich | Thatsanee Charoenporn | YingJu Xia
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Corpus-based approaches and statistical approaches have been the main stream of natural language processing research for the past two decades. Language resources play a key role in such approaches, but there is an insufficient amount of language resources in many Asian languages. In this situation, standardisation of language resources would be of great help in developing resources in new languages. This paper presents the latest development efforts of our project which aims at creating a common standard for Asian language resources that is compatible with an international standard. In particular, the paper focuses on i) lexical specification and data categories relevant for building multilingual lexical resources for Asian languages; ii) a core upper-layer ontology needed for ensuring multilingual interoperability and iii) the evaluation platform used to test the entire architectural framework.