Arda Çelebi


Segmenting Hashtags using Automatically Created Training Data
Arda Çelebi | Arzucan Özgür
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Hashtags, which are commonly composed of multiple words, are increasingly used to convey the actual messages in tweets. Understanding what tweets are saying is getting more dependent on understanding hashtags. Therefore, identifying the individual words that constitute a hashtag is an important, yet a challenging task due to the abrupt nature of the language used in tweets. In this study, we introduce a feature-rich approach based on using supervised machine learning methods to segment hashtags. Our approach is unsupervised in the sense that instead of using manually segmented hashtags for training the machine learning classifiers, we automatically create our training data by using tweets as well as by automatically extracting hashtag segmentations from a large corpus. We achieve promising results with such automatically created noisy training data.


Self-training a Constituency Parser using n-gram Trees
Arda Çelebi | Arzucan Özgür
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this study, we tackle the problem of self-training a feature-rich discriminative constituency parser. We approach the self-training problem with the assumption that while the full sentence parse tree produced by a parser may contain errors, some portions of it are more likely to be correct. We hypothesize that instead of feeding the parser the guessed full sentence parse trees of its own, we can break them down into smaller ones, namely n-gram trees, and perform self-training on them. We build an n-gram parser and transfer the distinct expertise of the $n$-gram parser to the full sentence parser by using the Hierarchical Joint Learning (HJL) approach. The resulting jointly self-trained parser obtains slight improvement over the baseline.


Semi-Supervised Discriminative Language Modeling with Out-of-Domain Text Data
Arda Çelebi | Murat Saraçlar
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

BOUNCE: Sentiment Classification in Twitter using Rich Feature Sets
Nadin Kökciyan | Arda Çelebi | Arzucan Özgür | Suzan Üsküdarlı
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)


MEAD - A Platform for Multidocument Multilingual Text Summarization
Dragomir Radev | Timothy Allison | Sasha Blair-Goldensohn | John Blitzer | Arda Çelebi | Stanko Dimitrov | Elliott Drabek | Ali Hakim | Wai Lam | Danyu Liu | Jahna Otterbacher | Hong Qi | Horacio Saggion | Simone Teufel | Michael Topper | Adam Winkel | Zhu Zhang
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


Evaluation Challenges in Large-Scale Document Summarization
Dragomir R. Radev | Simone Teufel | Horacio Saggion | Wai Lam | John Blitzer | Hong Qi | Arda Çelebi | Danyu Liu | Elliott Drabek
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics