Niko Schenk


2021

pdf
How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages
Rachit Bansal | Himanshu Choudhary | Ravneet Punia | Niko Schenk | Émilie Pagé-Perron | Jacob Dahl
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

Despite the recent advancements of attention-based deep learning architectures across a majority of Natural Language Processing tasks, their application remains limited in a low-resource setting because of a lack of pre-trained models for such languages. In this study, we make the first attempt to investigate the challenges of adapting these techniques to an extremely low-resource language – Sumerian cuneiform – one of the world’s oldest written languages attested from at least the beginning of the 3rd millennium BC. Specifically, we introduce the first cross-lingual information extraction pipeline for Sumerian, which includes part-of-speech tagging, named entity recognition, and machine translation. We introduce InterpretLR, an interpretability toolkit for low-resource NLP and use it alongside human evaluations to gauge the trained models. Notably, all our techniques and most components of our pipeline can be generalised to any low-resource language. We publicly release all our implementations including a novel data set with domain-specific pre-processing to promote further research in this domain.

2020

pdf
Towards the First Machine Translation System for Sumerian Transliterations
Ravneet Punia | Niko Schenk | Christian Chiarcos | Émilie Pagé-Perron
Proceedings of the 28th International Conference on Computational Linguistics

The Sumerian cuneiform script was invented more than 5,000 years ago and represents one of the oldest in history. We present the first attempt to translate Sumerian texts into English automatically. We publicly release high-quality corpora for standardized training and evaluation and report results on experiments with supervised, phrase-based, and transfer learning techniques for machine translation. Quantitative and qualitative evaluations indicate the usefulness of the translations. Our proposed methodology provides a broader audience of researchers with novel access to the data, accelerates the costly and time-consuming manual translation process, and helps them better explore the relationships between Sumerian cuneiform and Mesopotamian culture.

pdf
Translation Inference by Concept Propagation
Christian Chiarcos | Niko Schenk | Christian Fäth
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

This paper describes our contribution to the Third Shared Task on Translation Inference across Dictionaries (TIAD-2020). We describe an approach on translation inference based on symbolic methods, the propagation of concepts over a graph of interconnected dictionaries: Given a mapping from source language words to lexical concepts (e.g., synsets) as a seed, we use bilingual dictionaries to extrapolate a mapping of pivot and target language words to these lexical concepts. Translation inference is then performed by looking up the lexical concept(s) of a source language word and returning the target language word(s) for which these lexical concepts have the respective highest score. We present two instantiations of this system: One using WordNet synsets as concepts, and one using lexical entries (translations) as concepts. With a threshold of 0, the latter configuration is the second among participant systems in terms of F1 score. We also describe additional evaluation experiments on Apertium data, a comparison with an earlier approach based on embedding projection, and an approach for constrained projection that outperforms the TIAD-2020 vanilla system by a large margin.

2018

pdf
Knowing the Author by the Company His Words Keep
Armin Hoenen | Niko Schenk
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
The ACoLi CoNLL Libraries: Beyond Tab-Separated Values
Christian Chiarcos | Niko Schenk
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Towards a Linked Open Data Edition of Sumerian Corpora
Christian Chiarcos | Émilie Pagé-Perron | Ilya Khait | Niko Schenk | Lucas Reckling
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations
Samuel Rönnqvist | Niko Schenk | Christian Chiarcos
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model’s ability to selectively focus on the relevant parts of an input sequence.

pdf
Resource-Lean Modeling of Coherence in Commonsense Stories
Niko Schenk | Christian Chiarcos
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We present a resource-lean neural recognizer for modeling coherence in commonsense stories. Our lightweight system is inspired by successful attempts to modeling discourse relations and stands out due to its simplicity and easy optimization compared to prior approaches to narrative script learning. We evaluate our approach in the Story Cloze Test demonstrating an absolute improvement in accuracy of 4.7% over state-of-the-art implementations.

2016

pdf
Do We Really Need All Those Rich Linguistic Features? A Neural Network-Based Approach to Implicit Sense Labeling
Niko Schenk | Christian Chiarcos | Kathrin Donandt | Samuel Rönnqvist | Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the CoNLL-16 shared task

pdf
Unsupervised Learning of Prototypical Fillers for Implicit Semantic Role Labeling
Niko Schenk | Christian Chiarcos
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf
Memory-Based Acquisition of Argument Structures and its Application to Implicit Role Detection
Christian Chiarcos | Niko Schenk
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Towards Semantic Language Classification: Inducing and Clustering Semantic Association Networks from Europarl
Steffen Eger | Niko Schenk | Alexander Mehler
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
Towards the Unsupervised Acquisition of Implicit Semantic Roles
Niko Schenk | Christian Chiarcos | Maria Sukhareva
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf
A Minimalist Approach to Shallow Discourse Parsing and Implicit Relation Recognition
Christian Chiarcos | Niko Schenk
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task