2023
pdf
abs
Quote Detection: A New Task and Dataset for NLP
Selma Tekir
|
Aybüke Güzel
|
Samet Tenekeci
|
Bekir Haman
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We measure unique vocabulary usage by a state-of-the-art language model and perform comparative statistical analysis against the Cornell Movie-Quotes Corpus. Furthermore, we run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts.
2020
pdf
abs
LGPSolver - Solving Logic Grid Puzzles Automatically
Elgun Jabrayilzade
|
Selma Tekir
Findings of the Association for Computational Linguistics: EMNLP 2020
Logic grid puzzle (LGP) is a type of word problem where the task is to solve a problem in logic. Constraints for the problem are given in the form of textual clues. Once these clues are transformed into formal logic, a deductive reasoning process provides the solution. Solving logic grid puzzles in a fully automatic manner has been a challenge since a precise understanding of clues is necessary to develop the corresponding formal logic representation. To meet this challenge, we propose a solution that uses a DistilBERT-based classifier to classify a clue into one of the predefined predicate types for logic grid puzzles. Another novelty of the proposed solution is the recognition of comparison structures in clues. By collecting comparative adjectives from existing dictionaries and utilizing a semantic framework to catch comparative quantifiers, the semantics of clues concerning comparison structures are better understood, ensuring conversion to correct logic representation. Our approach solves logic grid puzzles in a fully automated manner with 100% accuracy on the given puzzle datasets and outperforms state-of-the-art solutions by a large margin.
2019
pdf
abs
A Turkish Dataset for Gender Identification of Twitter Users
Erhan Sezerer
|
Ozan Polatbilek
|
Selma Tekir
Proceedings of the 13th Linguistic Annotation Workshop
Author profiling is the identification of an author’s gender, age, and language from his/her texts. With the increasing trend of using Twitter as a means to express thought, profiling the gender of an author from his/her tweets has become a challenge. Although several datasets in different languages have been released on this problem, there is still a need for multilingualism. In this work, we propose a dataset of tweets of Turkish Twitter users which are labeled with their gender information. The dataset has 3368 users in training set and 1924 users in test set where each user has 100 tweets. The dataset is publicly available.
2017
pdf
abs
A News Chain Evaluation Methodology along with a Lattice-based Approach for News Chain Construction
Mustafa Toprak
|
Özer Özkahraman
|
Selma Tekir
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Chain construction is an important requirement for understanding news and establishing the context. A news chain can be defined as a coherent set of articles that explains an event or a story. There’s a lack of well-established methods in this area. In this work, we propose a methodology to evaluate the “goodness” of a given news chain and implement a concept lattice-based news chain construction method by Hossain et al.. The methodology part is vital as it directly affects the growth of research in this area. Our proposed methodology consists of collected news chains from different studies and two “goodness” metrics, minedge and dispersion coefficient respectively. We assess the utility of the lattice-based news chain construction method by our proposed methodology.
pdf
abs
N-Hance at SemEval-2017 Task 7: A Computational Approach using Word Association for Puns
Özge Sevgili
|
Nima Ghotbi
|
Selma Tekir
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper presents a system developed for SemEval-2017 Task 7, Detection and Interpretation of English Puns consisting of three subtasks; pun detection, pun location, and pun interpretation, respectively. The system stands on recognizing a distinctive word which has a high association with the pun in the given sentence. The intended humorous meaning of pun is identified through the use of this word. Our official results confirm the potential of this approach.