Daniel Dakota - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Daniel Dakota

2025

Clustering Zero-Shot Uncertainty Estimations to Assess LLM Response Accuracy for Yes/No Q&A
Christopher T. Franck | Amy Vennos | W. Graham Mueller | Daniel Dakota
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

The power of Large Language Models (LLMs) in user workflows has increased the desire to access such technology in everyday work. While the ability to interact with models provides noticeable benefits, it also presents challenges in terms of how much trust a user should put in the system’s responses. This is especially true for external commercial and proprietary models where there is seldom direct access and only a response from an API is provided. While standard evaluation metrics, such as accuracy, provide starting points, they often may not provide enough information to users in settings where the confidence in a system’s response is important due to downstream or real-world impact, such as in Question & Answering (Q&A) workflows. To support users in assessing how accurate Q&A responses from such black-box LLMs scenarios are, we develop an uncertainty estimation framework that provides users with an analysis using a Dirichlet mixture model accessed from probabilities derived from a zero-shot classification model. We apply our framework to responses on the BoolQ Yes/No questions from GPT models, finding the resulting clusters allow a better quantification of uncertainty, providing a more fine-grained quantification of accuracy and precision across the space of model output while still being computationally practical. We further demonstrate its generalizability and reusability of the uncertainty model by applying it to a small set of Q&A collected from U.S. government websites.

Analyzing Polarization in Online Discourse on the 2023-2024 Israel–Hamas War
Daniel Miehling | Daniel Dakota | Sandra Kübler
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops

Performance Gaps in Acted and Naturalistic Speech: Insights from Speech Emotion Recognition Strategies on Customer Service Calls
Lily Kawaoto | Hita Gupta | Ning Yu | Daniel Dakota
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Current research in speech emotion recognition (SER) often uses speech data produced by actors which does not always best represent naturalistic speech. This can lead to challenges when applying models trained on such data sources to real-world data. We investigate the application of SER models developed on acted data and more naturalistic podcasts to service call data, with a particular focus on anger detection. Our results indicate that while there is noticeable performance degradation of models trained on acted data to the naturalistic data, weighted multimodal models developed on existing SER datasets–both acted and natural–show promise, but are limited in ability to recognize emotions that do not discernibly cluster.

Investigating Polarization in YouTube Comments via Aspect-Based Sentiment Analysis
Daniel Miehling | Daniel Dakota | Sandra Kübler
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

We investigate the use of Aspect-Based Sentiment Analysis (ABSA) to analyze polarization in online discourse. For the analysis, we use a corpus of over 3 million user comments and replies from four state-funded media channels from YouTube Shorts in the context of the 2023 Israel–Hamas war. We first annotate a subsample of approx. 5 000 comments for positive, negative, and neutral sentiment towards a list of topic related aspects. After training an ABSA model (Yang et al., 2023) on the corpus, we evaluate its performance on this task intrinsically, before evaluating the usability of the automatic analysis of the whole corpus for analyzing polarization. Our results show that the ABSA model achieves an F1 score of 77.9. The longitudinal and outlet analyses corroborate known trends and offer subject experts more fine-grained information about the use of domain-specific language in user-generated content.

Pushing the (Generative) Envelope: Measuring the Effect of Prompt Technique and Temperature on the Generation of Model-based Systems Engineering Artifacts
Erin Smith Crabb | Cedric Bernard | Matthew Jones | Daniel Dakota
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

System engineers use Model-based systems engineering (MBSE) approaches to help design and model system requirements. This manually intensive process requires expertise in both the domain of artifact creation (e.g., the requirements for a vacuum), and how to encode that information in a machine readable form (e.g., SysML). We investigated leveraging local LLMs to generate initial draft artifacts using a variety of prompt techniques and temperatures. Our experiments showed promise for generating certain types of artifacts, suggesting that even smaller, local models possesses enough MBSE knowledge to support system engineers. We observed however that while scores for artifacts remain stable across different temperature settings, this is potentially misleading as significantly different, though semantically equivalent, generations can be produced.

2024

Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study
Noor Abo Mokh | Daniel Dakota | Sandra Kübler
Proceedings of the Second Arabic Natural Language Processing Conference

We study dependency parsing for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi). Since no syntactically annotated data exist for Arabic dialects, we train the parser on a Modern Standard Arabic (MSA) corpus, which creates an out-of-domain setting.We investigate methods to close the gap between the source (MSA) and target data (dialects), e.g., by training on syntactically similar sentences to the test data. For testing, we manually annotate a small data set from a dialectal corpus. We focus on parsing two linguistic phenomena, which are difficult to parse: Idafa and coordination. We find that we can improve results by adding in-domain MSA data while adding dialectal embeddings only results in minor improvements.

Bits and Pieces: Investigating the Effects of Subwords in Multi-task Parsing across Languages and Domains
Daniel Dakota | Sandra Kübler
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Neural parsing is very dependent on the underlying language model. However, very little is known about how choices in the language model affect parsing performance, especially in multi-task learning. We investigate questions on how the choice of subwords affects parsing, how subword sharing is responsible for gains or negative transfer in a multi-task setting where each task is parsing of a specific domain of the same language. More specifically, we investigate these issues across four languages: English, German, Italian, and Turkish. We find a general preference for averaged or last subwords across languages and domains. However, specific POS tags may require different subwords, and the distributional overlap between subwords across domains is perhaps a more influential factor in determining positive or negative transfer than discrepancies in the data sizes.

Introducing a Parsed Corpus of Historical High German
Christopher D. Sapp | Elliott Evans | Rex Sprouse | Daniel Dakota
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We outline the ongoing development of the Indiana Parsed Corpus of (Historical) High German. Once completed, this corpus will fill the gap in Penn-style treebanks for Germanic languages by spanning High German from 1050 to 1950. This paper describes the process of building the corpus: selection of texts, decisions on part-of-speech tags and other labels, the process of annotation, and illustrative annotation issues unique to historical High German. The construction of the corpus has led to a refinement of the Penn labels, tailored to the particulars of this language.

Domain-Weighted Batch Sampling for Neural Dependency Parsing
Jacob Striebel | Daniel Dakota | Sandra Kübler
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

In neural dependency parsing, as well as in the broader field of NLP, domain adaptation remains a challenging problem. When adapting a parser to a target domain, there is a fundamental tension between the need to make use of out-of-domain data and the need to ensure that syntactic characteristic of the target domain are learned. In this work we explore a way to balance these two competing concerns, namely using domain-weighted batch sampling, which allows us to use all available training data, while controlling the probability of sampling in- and out-of-domain data when constructing training batches. We conduct experiments using ten natural language domains and find that domain-weighted batch sampling yields substantial performance improvements in all ten domains compared to a baseline of conventional randomized batch sampling.

Scaling Up Authorship Attribution
Jacob Striebel | Abishek Edikala | Ethan Irby | Alex Rosenfeld | J. Gage | Daniel Dakota | Sandra Kübler
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

We describe our system for authorship attribution in the IARPA HIATUS program. We describe the model and compute infrastructure developed to satisfy the set of technical constraints imposed by IARPA, including runtime limits as well as other constraints related to the ultimate use case. One use-case constraint concerns the explainability of the features used in the system. For this reason, we integrate features from frame semantic parsing, as they are both interpretable and difficult for adversaries to evade. One trade-off with using such features, however, is that more sophisticated feature representations require more complicated architectures, which limit usefulness in time-sensitive and constrained compute environments. We propose an approach to increase the efficiency of frame semantic parsing through an analysis of parallelization and beam search sizes. Our approach results in a system that is approximately 8.37x faster than the base system with a minimal effect on accuracy.

Evaluating Automatic Pronunciation Scoring with Crowd-sourced Speech Corpus Annotations
Nils Hjortnaes | Daniel Dakota | Sandra Kübler | Francis Tyers
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning

Proceedings of the 22nd Workshop on Treebanks and Linguistic Theories (TLT 2024)
Daniel Dakota | Sarah Jablotschkin | Sandra Kübler | Heike Zinsmeister
Proceedings of the 22nd Workshop on Treebanks and Linguistic Theories (TLT 2024)

2023

Bigfoot in Big Tech: Detecting Out of Domain Conspiracy Theories
Matthew Fort | Zuoyu Tian | Elizabeth Gabel | Nina Georgiades | Noah Sauer | Daniel Dakota | Sandra Kübler
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

We investigate approaches to classifying texts into either conspiracy theory or mainstream using the Language Of Conspiracy (LOCO) corpus. Since conspiracy theories are not monolithic constructs, we need to identify approaches that robustly work in an out-of- domain setting (i.e., across conspiracy topics). We investigate whether optimal in-domain set- tings can be transferred to out-of-domain set- tings, and we investigate different methods for bleaching to steer classifiers away from words typical for an individual conspiracy theory. We find that BART works better than an SVM, that we can successfully classify out-of-domain, but there are no clear trends in how to choose the best source training domains. Addition- ally, bleaching only topic words works better than bleaching all content words or completely delexicalizing texts.

Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
Daniel Dakota | Kilian Evang | Sandra Kübler | Lori Levin
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

Parsing Early New High German: Benefits and limitations of cross-dialectal training
Christopher Sapp | Daniel Dakota | Elliott Evans
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)

Historical treebanking within the generative framework has gained in popularity. However, there are still many languages and historical periods yet to be represented. For German, a constituency treebank exists for historical Low German, but not Early New High German. We begin to fill this gap by presenting our initial work on the Parsed Corpus of Early New High German (PCENHG). We present the methodological considerations and workflow for the treebank’s annotations and development. Given the limited amount of currently available PCENHG treebank data, we treat it as a low-resource language and leverage a larger, closely related variety—Middle Low German—to build a parser to help facilitate faster post-annotation correction. We present an analysis on annotation speeds and conclude with a small pilot use-case, highlighting potential for future linguistic analyses. In doing so we highlight the value of the treebank’s development for historical linguistic analysis and demonstrate the benefits and challenges of developing a parser using two closely related historical Germanic varieties.

2022

How to Parse a Creole: When Martinican Creole Meets French
Ludovic Mompelat | Daniel Dakota | Sandra Kübler
Proceedings of the 29th International Conference on Computational Linguistics

We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer.

Improving POS Tagging for Arabic Dialects on Out-of-Domain Texts
Noor Abo Mokh | Daniel Dakota | Sandra Kübler
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

We investigate part of speech tagging for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi), in an out-of-domain setting. More specifically, we look at the effectiveness of 1) upsampling the target dialect in the training data of a joint model, 2) increasing the consistency of the annotations, and 3) using word embeddings pre-trained on a large corpus of dialectal Arabic. We increase the accuracy on average by about 20 percentage points.

2021

Genres, Parsers, and BERT: The Interaction Between Parsers and BERT Models in Cross-Genre Constituency Parsing in English and Swedish
Daniel Dakota
Proceedings of the Second Workshop on Domain Adaptation for NLP

Genre and domain are often used interchangeably, but are two different properties of a text. Successful parser adaptation requires both cross-domain and cross-genre sensitivity (Rehbein and Bildhauer, 2017). While the impact domain differences have on parser performance degradation is more easily measurable in respect to lexical differences, impact of genre differences can be more nuanced. With the predominance of pre-trained language models (LMs; e.g. BERT (Devlin et al., 2019)), there are now additional complexities in developing cross-genre sensitive models due to the infusion of linguistic characteristics derived from, usually, a third genre. We perform a systematic set of experiments using two neural constituency parsers to examine how different parsers behave in combination with different BERT models with varying source and target genres in English and Swedish. We find that there is extensive difficulty in predicting the best source due to the complex interactions between genres, parsers, and LMs. Additionally, the influence of the data used to derive the underlying BERT model heavily influences how best to create more robust and effective cross-genre parsing models.

Annotations Matter: Leveraging Multi-task Learning to Parse UD and SUD
Zeeshan Ali Sayyed | Daniel Dakota
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Bidirectional Domain Adaptation Using Weighted Multi-Task Learning
Daniel Dakota | Zeeshan Ali Sayyed | Sandra Kübler
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

Domain adaption in syntactic parsing is still a significant challenge. We address the issue of data imbalance between the in-domain and out-of-domain treebank typically used for the problem. We define domain adaptation as a Multi-task learning (MTL) problem, which allows us to train two parsers, one for each do-main. Our results show that the MTL approach is beneficial for the smaller treebank. For the larger treebank, we need to use loss weighting in order to avoid a decrease in performance be-low the single task. In order to determine towhat degree the data imbalance between two domains and the domain differences affect results, we also carry out an experiment with two imbalanced in-domain treebanks and show that loss weighting also improves performance in an in-domain setting. Given loss weighting in MTL, we can improve results for both parsers.

Examining the Effects of Preprocessing on the Detection of Offensive Language in German Tweets
Sebastian Reimann | Daniel Dakota
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

What’s in a Span? Evaluating the Creativity of a Span-Based Neural Constituency Parser
Daniel Dakota | Sandra Kübler
Proceedings of the Society for Computation in Linguistics 2021

Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)
Daniel Dakota | Kilian Evang | Sandra Kübler
Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)

2019

Investigating Multilingual Abusive Language Detection: A Cautionary Tale
Kenneth Steimel | Daniel Dakota | Yue Chen | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Abusive language detection has received much attention in the last years, and recent approaches perform the task in a number of different languages. We investigate which factors have an effect on multilingual settings, focusing on the compatibility of data and annotations. In the current paper, we focus on English and German. Our findings show large differences in performance between the two languages. We find that the best performance is achieved by different classification algorithms. Sampling to address class imbalance issues is detrimental for German and beneficial for English. The only similarity that we find is that neither data set shows clear topics when we compare the results of topic modeling to the gold standard. Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.

2018

Practical Parsing for Downstream Applications
Daniel Dakota | Sandra Kübler
Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts

2017

Towards Replicability in Parsing
Daniel Dakota | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We investigate parsing replicability across 7 languages (and 8 treebanks), showing that choices concerning the use of grammatical functions in parsing or evaluation, the influence of the rare word threshold, as well as choices in test sentences and evaluation script options have considerable and often unexpected effects on parsing accuracies. All of those choices need to be carefully documented if we want to ensure replicability.

Non-Deterministic Segmentation for Chinese Lattice Parsing
Hai Hu | Daniel Dakota | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRF-based approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the n-best CRF analyses, combined with highly probable words.

2016

IUCL at SemEval-2016 Task 6: An Ensemble Model for Stance Detection in Twitter
Can Liu | Wen Li | Bradford Demarest | Yue Chen | Sara Couture | Daniel Dakota | Nikita Haduong | Noah Kaufman | Andrew Lamont | Manan Pancholi | Kenneth Steimel | Sandra Kübler
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2014

“My Curiosity was Satisfied, but not in a Good Way”: Predicting User Ratings for Online Recipes
Can Liu | Chun Guo | Daniel Dakota | Sridhar Rajagopalan | Wen Li | Sandra Kübler | Ning Yu
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

Parsing German: How Much Morphology Do We Need?
Wolfgang Maier | Sandra Kübler | Daniel Dakota | Daniel Whyatt
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

Venues