Berfin Aktaş

Also published as: Berfin Aktas


2024

pdf
Shallow Discourse Parsing on Twitter Conversations
Berfin Aktas | Burak Özmen
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

We present our PDTB-style annotations on conversational Twitter data, which was initially annotated by Scheffler et al. (2019). We introduced 1,043 new annotations to the dataset, nearly doubling the number of previously annotated discourse relations. Subsequently, we applied a neural Shallow Discourse Parsing (SDP) model to the resulting corpus, improving its performance through retraining with in-domain data. The most substantial improvement was observed in the sense identification task (+19%). Our experiments with diverse training data combinations underline the potential benefits of exploring various data combinations in domain adaptation efforts for SDP. To the best of our knowledge, this is the first application of Shallow Discourse Parsing on Twitter data

2020

pdf
TwiConv: A Coreference-annotated Corpus of Twitter Conversations
Berfin Aktaş | Annalena Kohnert
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

This article introduces TwiConv, an English coreference-annotated corpus of microblog conversations from Twitter. We describe the corpus compilation process and the annotation scheme, and release the corpus publicly, along with this paper. We manually annotated nominal coreference in 1756 tweets arranged in 185 conversation threads. The annotation achieves satisfactory annotation agreement results. We also present a new method for mapping the tweet contents with distributed stand-off annotations, which can easily be adapted to different annotation tasks.

pdf
Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş | Veronika Solopova | Annalena Kohnert | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020

The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.

pdf
Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of “behavior” for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken–written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.

2019

pdf
Annotating Shallow Discourse Relations in Twitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

2018

pdf bib
Anaphora Resolution for Twitter Conversations: An Exploratory Study
Berfin Aktaş | Tatjana Scheffler | Manfred Stede
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

We present a corpus study of pronominal anaphora on Twitter conversations. After outlining the specific features of this genre, with respect to reference resolution, we explain the construction of our corpus and the annotation steps. From this we derive a list of phenomena that need to be considered when performing anaphora resolution on this type of data. Finally, we test the performance of an off-the-shelf resolution system, and provide some qualitative error analysis.

2010

pdf
Discourse Relation Configurations in Turkish and an Annotation Environment
Berfin Aktaş | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf
Annotating Subordinators in the Turkish Discourse Bank
Deniz Zeyrek | Umit Deniz Turan | Cem Bozsahin | Ruket Cakici | Ayisigi B. Sevdik-Calli | Isin Demirsahin | Berfin Aktas | İhsan Yalcinkaya | Hale Ogel
Proceedings of the Third Linguistic Annotation Workshop (LAW III)