Stephan Oepen

2024

pdf abs
Argument Sharing in Meaning Representation Parsing
Maja Buljan | Stephan Oepen | Lilja Øvrelid
Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024

We present a contrastive study of argument sharing across three graph-based meaning representation frameworks, where semantically shared arguments manifest as reentrant graph nodes. For a state-of-the-art graph parser, we observe how parser performance – in terms of output quality – covaries with overall graph complexity, on the one hand, and presence of different types of reentrancies, on the other hand. We identify common linguistic phenomena that give rise to shared arguments, and therefore node reentrancies, through a small-case and partially automated annotation study and parallel error anaylsis of actual parser outputs. Our results provide new insights into the distribution of different types of reentrancies in meaning representation graphs for three distinct frameworks, as well as on the effects that these structures have on parser performance, thus suggesting both novel cross-framework generalisations as well as avenues for focussed parser development.

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive. We describe our methods for data acquisition, management and processing of large corpora, which rely on open-source software tools and high-performance computing. Our monolingual collection focuses on low- to medium-resourced languages and covers 75 languages and a total of ≈ 5.6 trillion word tokens de-duplicated on the document level. Our English-centric parallel corpus is derived from its monolingual counterpart and covers 18 language pairs and more than 96 million aligned sentence pairs with roughly 1.4 billion English tokens. The HPLT language resources are one of the largest open text corpora ever released, providing a great resource for language modeling and machine translation training. We publicly release the corpora, the software, and the tools used in this work.

2022

This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions.

2021

pdf abs
Structured Sentiment Analysis as Dependency Graph Parsing
Jeremy Barnes | Robin Kurtz | Stephan Oepen | Lilja Øvrelid | Erik Velldal
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e.g., target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency graph parsing, where the nodes are spans of sentiment holders, targets and expressions, and the arcs are the relations between them. We perform experiments on five datasets in four languages (English, Norwegian, Basque, and Catalan) and show that this approach leads to strong improvements over state-of-the-art baselines. Our analysis shows that refining the sentiment graphs with syntactic dependency information further improves results.

pdf abs
Large-Scale Contextualised Language Modelling for Norwegian
Andrey Kutuzov | Jeremy Barnes | Erik Velldal | Lilja Øvrelid | Stephan Oepen
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu

pdf bib
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)
Stephan Oepen | Kenji Sagae | Reut Tsarfaty | Gosse Bouma | Djamé Seddah | Daniel Zeman
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

2020

pdf abs
A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation Parsing
Maja Buljan | Joakim Nivre | Stephan Oepen | Lilja Øvrelid
Proceedings of the Twelfth Language Resources and Evaluation Conference

We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation parsing, i.e. mapping from natural language utterances to graph-based encodings of its semantic structure. Drawing inspiration from earlier work in syntactic dependency parsing, we transfer and refine several quantitative diagnosis techniques for use in the context of the 2019 shared task on Meaning Representation Parsing (MRP). As in parsing proper, moving evaluation from simple rooted trees to general graphs brings along its own range of challenges. Specifically, we seek to begin to shed light on relative strenghts and weaknesses in different broad families of parsing techniques. In addition to these theoretical reflections, we conduct a pilot experiment on a selection of top-performing MRP systems and one of the five meaning representation frameworks in the shared task. Empirical results suggest that the proposed methodology can be meaningfully applied to parsing into graph-structured target representations, uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.

The 2020 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks and languages. Extending a similar setup from the previous year, five distinct approaches to the representation of sentence meaning in the form of directed graphs were represented in the English training and evaluation data for the task, packaged in a uniform graph abstraction and serialization; for four of these representation frameworks, additional training and evaluation data was provided for one additional language per framework. The task received submissions from eight teams, of which two do not participate in the official ranking because they arrived after the closing deadline or made use of additional training data. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at: http://mrp.nlpl.eu

pdf bib abs
DRS at MRP 2020: Dressing up Discourse Representation Structures as Graphs
Lasha Abzianidze | Johan Bos | Stephan Oepen
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

Discourse Representation Theory (DRT) is a formal account for representing the meaning of natural language discourse. Meaning in DRT is modeled via a Discourse Representation Structure (DRS), a meaning representation with a model-theoretic interpretation, which is usually depicted as nested boxes. In contrast, a directed labeled graph is a common data structure used to encode semantics of natural language texts. The paper describes the procedure of dressing up DRSs as directed labeled graphs to include DRT as a new framework in the 2020 shared task on Cross-Framework and Cross-Lingual Meaning Representation Parsing. Since one of the goals of the shared task is to encourage unified models for several semantic graph frameworks, the conversion procedure was biased towards making the DRT graph framework somewhat similar to other graph-based meaning representation frameworks.

pdf bib
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
Gosse Bouma | Yuji Matsumoto | Stephan Oepen | Kenji Sagae | Djamé Seddah | Weiwei Sun | Anders Søgaard | Reut Tsarfaty | Dan Zeman
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

pdf abs
End-to-End Negation Resolution as Graph Parsing
Robin Kurtz | Stephan Oepen | Marco Kuhlmann
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We present a neural end-to-end architecture for negation resolution based on a formulation of the task as a graph parsing problem. Our approach allows for the straightforward inclusion of many types of graph-structured features without the need for representation-specific heuristics. In our experiments, we specifically gauge the usefulness of syntactic information for negation resolution. Despite the conceptual simplicity of our architecture, we achieve state-of-the-art results on the Conan Doyle benchmark dataset, including a new top result for our best model.

2019

pdf bib abs
Graph-Based Meaning Representations: Design and Processing
Alexander Koller | Stephan Oepen | Weiwei Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial is on representing and processing sentence meaning in the form of labeled directed graphs. The tutorial will (a) briefly review relevant background in formal and linguistic semantics; (b) semi-formally define a unified abstract view on different flavors of semantic graphs and associated terminology; (c) survey common frameworks for graph-based meaning representation and available graph banks; and (d) offer a technical overview of a representative selection of different parsing approaches.

pdf bib
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning
Stephan Oepen | Omri Abend | Jan Hajic | Daniel Hershcovich | Marco Kuhlmann | Tim O’Gorman | Nianwen Xue
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks. Five distinct approaches to the representation of sentence meaning in the form of directed graph were represented in the training and evaluation data for the task, packaged in a uniform abstract graph representation and serialization. The task received submissions from eighteen teams, of which five do not participate in the official ranking because they arrived after the closing deadline, made use of additional training data, or involved one of the task co-organizers. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at: http://mrp.nlpl.eu

pdf abs
The ERG at MRP 2019: Radically Compositional Semantic Dependencies
Stephan Oepen | Dan Flickinger
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

The English Resource Grammar (ERG) is a broad-coverage computational grammar of English that outputs underspecified logical-form representations of meaning in a framework dubbed English Resource Semantics (ERS). Two of the target representations in the the 2019 Shared Task on Cross-Framework Meaning Representation Parsing (MRP 2019) derive graph-based simplifications of ERS, viz. Elementary Dependency Structures (EDS) and DELPH-IN MRS Bi-Lexical Dependencies (DM). As a point of reference outside the official MRP competition, we parsed the evaluation strings using the ERG and converted the resulting meaning representations to EDS and DM. These graphs yield higher evaluation scores than the purely data-driven parsers in the actual shared task, suggesting that the general-purpose linguistic knowledge about English grammar encoded in the ERG can add value when parsing into these meaning representations.

pdf bib
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)
Marie Candito | Kilian Evang | Stephan Oepen | Djamé Seddah
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

2018

pdf bib abs
The 2018 Shared Task on Extrinsic Parser Evaluation: On the Downstream Utility of English Universal Dependency Parsers
Murhaf Fares | Stephan Oepen | Lilja Øvrelid | Jari Björne | Richard Johansson
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We summarize empirical results and tentative conclusions from the Second Extrinsic Parser Evaluation Initiative (EPE 2018). We review the basic task setup, downstream applications involved, and end-to-end results for seventeen participating teams. Based on in-depth quantitative and qualitative analysis, we correlate intrinsic evaluation results at different layers of morph-syntactic analysis with observed downstream behavior.

pdf abs
Transfer and Multi-Task Learning for Noun–Noun Compound Interpretation
Murhaf Fares | Stephan Oepen | Erik Velldal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun–noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification model generalize over a highly skewed distribution of relations. Further, we demonstrate how dual annotation with two distinct sets of relations over the same set of compounds can be exploited to improve the overall accuracy of a neural classifier and its F1 scores on the less frequent, but more difficult relations.

2017

pdf
Word vectors, reuse, and replicability: Towards a community repository of large-text resources
Murhaf Fares | Andrey Kutuzov | Stephan Oepen | Erik Velldal
Proceedings of the 21st Nordic Conference on Computational Linguistics

For decades, most self-respecting linguistic engineering initiatives have designed and implemented custom representations for various layers of, for example, morphological, syntactic, and semantic analysis. Despite occasional efforts at harmonization or even standardization, our field today is blessed with a multitude of ways of encoding and exchanging linguistic annotations of these types, both at the levels of ‘abstract syntax’, naming choices, and of course file formats. To a large degree, it is possible to work within and across design plurality by conversion, and often there may be good reasons for divergent design reflecting differences in use. However, it is likely that some abstract commonalities across choices of representation are obscured by more superficial differences, and conversely there is no obvious procedure to tease apart what actually constitute contentful vs. mere technical divergences. In this study, we seek to conceptually align three representations for common types of morpho-syntactic analysis, pinpoint what in our view constitute contentful differences, and reflect on the underlying principles and specific requirements that led to individual choices. We expect that a more in-depth understanding of these choices across designs may led to increased harmonization, or at least to more informed design of future representations.

2016

We announce a new language resource for research on semantic parsing, a large, carefully curated collection of semantic dependency graphs representing multiple linguistic traditions. This resource is called SDP~2016 and provides an update and extension to previous versions used as Semantic Dependency Parsing target representations in the 2014 and 2015 Semantic Evaluation Exercises. For a common core of English text, this third edition comprises semantic dependency graphs from four distinct frameworks, packaged in a unified abstract format and aligned at the sentence and token levels. SDP 2016 is the first general release of this resource and available for licensing from the Linguistic Data Consortium in May 2016. The data is accompanied by an open-source SDP utility toolkit and system results from previous contrastive parsing evaluations against these target representations.

pdf
Squibs: Towards a Catalogue of Linguistic Graph Banks
Marco Kuhlmann | Stephan Oepen
Computational Linguistics, Volume 42, Issue 4 - December 2016

2015

pdf
Semantic Dependency Graph Parsing Using Tree Approximations
Željko Agić | Alexander Koller | Stephan Oepen
Proceedings of the 11th International Conference on Computational Semantics

pdf
Layers of Interpretation: On Grammar and Compositionality
Emily M. Bender | Dan Flickinger | Stephan Oepen | Woodley Packard | Ann Copestake
Proceedings of the 11th International Conference on Computational Semantics

2014

pdf abs
Semantic Technologies for Querying Linguistic Annotations: An Experiment Focusing on Graph-Structured Data
Milen Kouylekov | Stephan Oepen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

With growing interest in the creation and search of linguistic annotations that form general graphs (in contrast to formally simpler, rooted trees), there also is an increased need for infrastructures that support the exploration of such representations, for example logical-form meaning representations or semantic dependency graphs. In this work, we heavily lean on semantic technologies and in particular the data model of the Resource Description Framework (RDF) to represent, store, and efficiently query very large collections of text annotated with graph-structured representations of sentence meaning.

pdf abs
Towards an Encyclopedia of Compositional Semantics: Documenting the Interface of the English Resource Grammar
Dan Flickinger | Emily M. Bender | Stephan Oepen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We motivate and describe the design and development of an emerging encyclopedia of compositional semantics, pursuing three objectives. We first seek to compile a comprehensive catalogue of interoperable semantic analyses, i.e., a precise characterization of meaning representations for a broad range of common semantic phenomena. Second, we operationalize the discovery of semantic phenomena and their definition in terms of what we call their semantic fingerprint, a formal account of the building blocks of meaning representation involved and their configuration. Third, we ground our work in a carefully constructed semantic test suite of minimal exemplars for each phenomenon, along with a ‘target’ fingerprint that enables automated regression testing. We work towards these objectives by codifying and documenting the body of knowledge that has been constructed in a long-term collaborative effort, the development of the LinGO English Resource Grammar. Documentation of its semantic interface is a prerequisite to use by non-experts of the grammar and the analyses it produces, but this effort also advances our own understanding of relevant interactions among phenomena, as well as of areas for future work in the grammar.

pdf abs
Off-Road LAF: Encoding and Processing Annotations in NLP Workflows
Emanuele Lapponi | Erik Velldal | Stephan Oepen | Rune Lain Knudsen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Linguistic Annotation Framework (LAF) provides an abstract data model for specifying interchange representations to ensure interoperability among different annotation formats. This paper describes an ongoing effort to adapt the LAF data model as the interchange representation in complex workflows as used in the Language Analysis Portal (LAP), an on-line and large-scale processing service that is developed as part of the Norwegian branch of the Common Language Resources and Technology Infrastructure (CLARIN) initiative. Unlike several related on-line processing environments, which predominantly instantiate a distributed architecture of web services, LAP achives scalability to potentially very large data volumes through integration with the Norwegian national e-Infrastructure, and in particular job sumission to a capacity compute cluster. This setup leads to tighter integration requirements and also calls for efficient, low-overhead communication of (intermediate) processing results with workflows. We meet these demands by coupling the LAF data model with a lean, non-redundant JSON-based interchange format and integration of an agile and performant NoSQL database, allowing parallel access from cluster nodes, as the central repository of linguistic annotation.

pdf
In-House: An Ensemble of Pre-Existing Off-the-Shelf Parsers
Yusuke Miyao | Stephan Oepen | Daniel Zeman
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf
Simple Negation Scope Resolution through Deep Parsing: A Semantic Solution to a Semantic Problem
Woodley Packard | Emily M. Bender | Jonathon Read | Stephan Oepen | Rebecca Dridan
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
RDF Triple Stores and a Custom SPARQL Front-End for Indexing and Searching (Very) Large Semantic Networks
Milen Kouylekov | Stephan Oepen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

2013

pdf
Survey on parsing three dependency representations for English
Angelina Ivanova | Stephan Oepen | Lilja Øvrelid
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

pdf bib
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)
Stephan Oepen | Kristin Hagen | Janne Bondi Johannessen
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf
Tidying up the Basement: A Tale of Large-Scale Parsing on National eInfrastructure
Stephan Oepen
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf
Simple and Accountable Segmentation of Marked-up Text
Jonathon Read | Rebecca Dridan | Stephan Oepen
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf
HPC-ready Language Analysis for Human Beings
Emanuele Lapponi | Erik Velldal | Nikolay A. Vazov | Stephan Oepen
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf
On Different Approaches to Syntactic Analysis Into Bi-Lexical Dependencies. An Empirical Comparison of Direct, PCFG-Based, and HPSG-Based Parsers
Angelina Ivanova | Stephan Oepen | Rebecca Dridan | Dan Flickinger | Lilja Øvrelid
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

pdf
Document Parsing: Towards Realistic Syntactic Analysis
Rebecca Dridan | Stephan Oepen
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

2012

pdf
Speculation and Negation: Rules, Rankers, and the Role of Syntax
Erik Velldal | Lilja Øvrelid | Jonathon Read | Stephan Oepen
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf
Tokenization: Returning to a Long Solved Problem — A Survey, Contrastive Experiment, Recommendations, and Toolkit —
Rebecca Dridan | Stephan Oepen
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Towards an ACL Anthology Corpus with Logical Document Structure. An Overview of the ACL 2012 Contributed Task
Ulrich Schäfer | Jonathon Read | Stephan Oepen
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf
Towards High-Quality Text Stream Extraction from PDF. Technical Background to the ACL 2012 Contributed Task
Øyvind Raddum Berg | Stephan Oepen | Jonathon Read
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf bib
Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies
Angelina Ivanova | Stephan Oepen | Lilja Øvrelid | Dan Flickinger
Proceedings of the Sixth Linguistic Annotation Workshop

pdf
Sentence Boundary Detection: A Long Solved Problem?
Jonathon Read | Rebecca Dridan | Stephan Oepen | Lars Jørgen Solberg
Proceedings of COLING 2012: Posters

pdf abs
The WeSearch Corpus, Treebank, and Treecache – A Comprehensive Sample of User-Generated Content
Jonathon Read | Dan Flickinger | Rebecca Dridan | Stephan Oepen | Lilja Øvrelid
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present the WeSearch Data Collection (WDC)―a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data selection and extraction process, with a focus on the extraction of linguistic content from different sources. We present the format of syntacto-semantic annotations found in this resource and present initial parsing results for these data, as well as some reflections following a first round of treebanking.

pdf
UiO1: Constituent-Based Discriminative Ranking for Negation Resolution
Jonathon Read | Erik Velldal | Lilja Øvrelid | Stephan Oepen
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf
Parser Evaluation Using Elementary Dependency Matching
Rebecca Dridan | Stephan Oepen
Proceedings of the 12th International Conference on Parsing Technologies

pdf
Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking
Andrew MacKinlay | Rebecca Dridan | Dan Flickinger | Stephan Oepen | Timothy Baldwin
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
Emily M. Bender | Dan Flickinger | Stephan Oepen | Yi Zhang
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules
Erik Velldal | Lilja Øvrelid | Stephan Oepen
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf
Syntactic Scope Resolution in Uncertainty Analysis
Lilja Øvrelid | Erik Velldal | Stephan Oepen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf abs
WikiWoods: Syntacto-Semantic Annotation for English Wikipedia
Dan Flickinger | Stephan Oepen | Gisle Ytrestøl
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

WikiWoods is an ongoing initiative to provide rich syntacto-semantic annotations for English Wikipedia. We sketch an automated processing pipeline to extract relevant textual content from Wikipedia sources, segment documents into sentence-like units, parse and disambiguate using a broad-coverage precision grammar, and support the export of syntactic and semantic information in various formats. The full parsed corpus is accompanied by a subset of Wikipedia articles for which gold-standard annotations in the same format were produced manually. This subset was selected to represent a coherent domain, Wikipedia entries on the broad topic of Natural Language Processing.

2009

pdf
Automatic Translation of Norwegian Noun Compounds
Lars Bungum | Stephan Oepen
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf
Hybrid Multilingual Parsing with HPSG for SRL
Yi Zhang | Rui Wang | Stephan Oepen
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2008

pdf abs
Some Fine Points of Hybrid Natural Language Parsing
Peter Adolphs | Stephan Oepen | Ulrich Callmeier | Berthold Crysmann | Dan Flickinger | Bernd Kiefer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Large-scale grammar-based parsing systems nowadays increasingly rely on independently developed, more specialized components for pre-processing their input. However, different tools make conflicting assumptions about very basic properties such as tokenization. To make linguistic annotation gathered in pre-processing available to deep parsing, a hybrid NLP system needs to establish a coherent mapping between the two universes. Our basic assumption is that tokens are best described by attribute value matrices (AVMs) that may be arbitrarily complex. We propose a powerful resource-sensitive rewrite formalism, chart mapping, that allows us to mediate between the token descriptions delivered by shallow pre-processing components and the input expected by the grammar. We furthermore propose a novel way of unknown word treatment where all generic lexical entries are instantiated that are licensed by a particular token AVM. Again, chart mapping is used to give the grammar writer full control as to which items (e.g. native vs. generic lexical items) enter syntactic parsing. We discuss several further uses of the original idea and report on early experiences with the new machinery.

2007

pdf
Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT
Stephan Oepen | Erik Velldal | Jan Tore Lønning | Paul Meurer | Victoria Rosén | Dan Flickinger
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf
Exploiting Semantic Information for HPSG Parse Selection
Sanae Fujita | Francis Bond | Stephan Oepen | Takaaki Tanaka
ACL 2007 Workshop on Deep Linguistic Processing

pdf
Efficiency in Unification-Based N-Best Parsing
Yi Zhang | Stephan Oepen | John Carroll
Proceedings of the Tenth International Conference on Parsing Technologies

2006

pdf abs
Discriminant-Based MRS Banking
Stephan Oepen | Jan Tore Lønning
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present an approach to discriminant-based MRS banking, i.e. the construction of an annotated corpus where each input item is paired with a logical-form semantics. Semantic annotations are produced by parsing with a broad-coverage precision grammar, followed by manual disambiguation. The selection of the preferred analysis for each item (and hence its semantic form) builds on a notion of semantic discriminants, essentially localized dependencies extracted from a full-fledged, underspecified semantic representation.

pdf
Re-Usable Tools for Precision Machine Translation
Jan Tore Lønning | Stephan Oepen
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf
Statistical Ranking in Tactical Generation
Erik Velldal | Stephan Oepen
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf
Using a Bi-Lingual Dictionary in Lexical Transfer
Lars Nygaard | Jan Tore Lønning | Torbjørn Nordgård | Stephan Oepen
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

pdf
High Efficiency Realization for a Wide-Coverage Unification Grammar
John Carroll | Stephan Oepen
Second International Joint Conference on Natural Language Processing: Full Papers

pdf abs
Maximum Entropy Models for Realization Ranking
Erik Velldal | Stephan Oepen
Proceedings of Machine Translation Summit X: Papers

In this paper we describe and evaluate different statistical models for the task of realization ranking, i.e. the problem of discriminating between competing surface realizations generated for a given input semantics. Three models are trained and tested; an n-gram language model, a discriminative maximum entropy model using structural features, and a combination of these two. Our realization component forms part of a larger, hybrid MT system.

pdf abs
SEM-I Rational MT: Enriching Deep Grammars with a Semantic Interface for Scalable Machine Translation
Dan Flickinger | Jan Tore Lønning | Helge Dyvik | Stephan Oepen | Francis Bond
Proceedings of Machine Translation Summit X: Papers

In the LOGON machine translation system where semantic transfer using Minimal Recursion Semantics is being developed in conjunction with two existing broad-coverage grammars of Norwegian and English, we motivate the use of a grammar-specific semantic interface (SEM-I) to facilitate the construction and maintenance of a scalable translation engine. The SEM-I is a theoretically grounded component of each grammar, capturing several classes of lexical regularities while also serving the crucial engineering function of supplying a reliable and complete specification of the elementary predications the grammar can realize. We make extensive use of underspecification and type hierarchies to maximize generality and precision.

pdf bib
Open Source Machine Translation with DELPH-IN
Francis Bond | Stephan Oepen | Melanie Siegel | Ann Copestake | Dan Flickinger
Workshop on open-source machine translation

pdf
High Precision Treebanking—Blazing Useful Trees Using POS Information
Takaaki Tanaka | Francis Bond | Stephan Oepen | Sanae Fujita
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf
Holistic regression testing for high-quality MT: some methodological and technological reflections
Stephan Oepen | Helge Dyvik | Dan Flickinger | Jan Tore Lønning | Paul Meurer | Victoria Rosén
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

Over the past few years significant progress was accomplished in efficient processing with wide-coverage HPSG grammars. HPSG-based parsing systems are now available that can process medium-complexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human reading) time. A large number of engineering improvements in current HPSG systems were achieved through collaboration of multiple research centers and mutual exchange of experience, encoding techniques, algorithms, and even pieces of software. This article presents an approach to grammar and system engineering, termed competence & performance profiling, that makes systematic experimentation and the precise empirical study of system properties a focal point in development. Adapting the profiling metaphor familiar from software engineering to constraint-based grammars and parsers, enables developers to maintain an accurate record of system evolution, identify grammar and system deficiencies quickly, and compare to earlier versions or between different systems. We discuss a number of exemplary problems that motivate the experimental approach, and apply the empirical methodology in a fairly detailed discussion of what was achieved during a development period of three years. Given the collaborative nature in setup, the empirical results we present involve research and achievements of a large group of people.

pdf bib
Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems
John Carroll | Robert C. Moore | Stephan Oepen
Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems

pdf bib
Efficient Large-Scale Parsing – a Survey
John Carroll | Stephan Oepen
Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems

pdf
Cross-Platform, Cross-Grammar Comparison – Can it be Done?
Ulrich Callmeier | Stephan Oepen
Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems

pdf
Ambiguity Packing in Constraint-based Parsing Practical Results
Stephan Oepen | John Carroll
1st Meeting of the North American Chapter of the Association for Computational Linguistics