A Study on Expert Sourcing Enterprise Question Collection and Classification

Yuan Luo, Thomas Boucher, Tolga Oral, David Osofsky, Sara Weber


Abstract
Large enterprises, such as IBM, accumulate petabytes of free-text data within their organizations. To mine this big data, a critical ability is to enable meaningful question answering beyond keywords search. In this paper, we present a study on the characteristics and classification of IBM sales questions. The characteristics are analyzed both semantically and syntactically, from where a question classification guideline evolves. We adopted an enterprise level expert sourcing approach to gather questions, annotate questions based on the guideline and manage the quality of annotations via enhanced inter-annotator agreement analysis. We developed a question feature extraction system and experimented with rule-based, statistical and hybrid question classifiers. We share our annotated corpus of questions and report our experimental results. Statistical classifiers separately based on n-grams and hand-crafted rule features give reasonable macro-f1 scores at 61.7% and 63.1% respectively. Rule based classifier gives a macro-f1 at 77.1%. The hybrid classifier with n-gram and rule features using a second guess model further improves the macro-f1 to 83.9%.
Anthology ID:
L14-1233
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
181–188
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/25_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Yuan Luo, Thomas Boucher, Tolga Oral, David Osofsky, and Sara Weber. 2014. A Study on Expert Sourcing Enterprise Question Collection and Classification. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 181–188, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A Study on Expert Sourcing Enterprise Question Collection and Classification (Luo et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/25_Paper.pdf