Hans Uszkoreit

2022

pdf bib
Obituary: Martin Kay
Ronald M. Kaplan | Hans Uszkoreit
Computational Linguistics, Volume 48, Issue 1 - March 2022

This paper presents a fine-grained test suite for the language pair German–English. The test suite is based on a number of linguistically motivated categories and phenomena and the semi-automatic evaluation is carried out with regular expressions. We describe the creation and implementation of the test suite in detail, providing a full list of all categories and phenomena. Furthermore, we present various exemplary applications of our test suite that have been implemented in the past years, like contributions to the Conference of Machine Translation, the usage of the test suite and MT outputs for quality estimation, and the expansion of the test suite to the language pair Portuguese–English. We describe how we tracked the development of the performance of various systems MT systems over the years with the help of the test suite and which categories and phenomena are prone to resulting in MT errors. For the first time, we also make a large part of our test suite publicly available to the research community.

2019

pdf bib abs
Linguistic Evaluation of German-English Machine Translation Using a Test Suite
Eleftherios Avramidis | Vivien Macketanz | Ursula Strohriegel | Hans Uszkoreit
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We present the results of the application of a grammatical test suite for German-to-English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been a improvement of function words, non verbal agreement and punctuation. More detailed conclusions about particular systems and phenomena are also presented.

2018

pdf bib
TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality
Vivien Macketanz | Renlong Ai | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Fine-grained evaluation of Quality Estimation for Machine translation based on a linguistically motivated Test Suite
Eleftherios Avramidis | Vivien Macketanz | Arle Lommel | Hans Uszkoreit
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib abs
Fine-grained evaluation of German-English Machine Translation based on a Test Suite
Vivien Macketanz | Eleftherios Avramidis | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

2017

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib abs
Annotation of Entities and Relations in Spanish Radiology Reports
Viviana Cotik | Darío Filippo | Roland Roller | Hans Uszkoreit | Feiyu Xu
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Radiology reports express the results of a radiology study and contain information about anatomical entities, findings, measures and impressions of the medical doctor. The use of information extraction techniques can help physicians to access this information in order to understand data and to infer further knowledge. Supervised machine learning methods are very popular to address information extraction, but are usually domain and language dependent. To train new classification models, annotated data is required. Moreover, annotated data is also required as an evaluation resource of information extraction algorithms. However, one major drawback of processing clinical data is the low availability of annotated datasets. For this reason we performed a manual annotation of radiology reports written in Spanish. This paper presents the corpus, the annotation schema, the annotation guidelines and further insight of the data.

pdf bib abs
Word Embeddings as Features for Supervised Coreference Resolution
Iliana Simova | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

A common reason for errors in coreference resolution is the lack of semantic information to help determine the compatibility between mentions referring to the same entity. Distributed representations, which have been shown successful in encoding relatedness between words, could potentially be a good source of such knowledge. Moreover, being obtained in an unsupervised manner, they could help address data sparsity issues in labeled training data at a small cost. In this work we investigate whether and to what extend features derived from word embeddings can be successfully used for supervised coreference resolution. We experiment with several word embedding models, and several different types of embeddingbased features, including embedding cluster and cosine similarity-based features. Our evaluations show improvements in the performance of a supervised state-of-theart coreference system.

A huge body of continuously growing written knowledge is available on the web in the form of social media posts, RSS feeds, and news articles. Real-time information extraction from such high velocity, high volume text streams requires scalable, distributed natural language processing pipelines. We introduce such a system for fine-grained event recognition within the big data framework Flink, and demonstrate its capabilities for extracting and geo-locating mobility- and industry-related events from heterogeneous text sources. Performance analyses conducted on several large datasets show that our system achieves high throughput and maintains low latency, which is crucial when events need to be detected and acted upon in real-time. We also present promising experimental results for the event extraction component of our system, which recognizes a novel set of event types. The demo system is available at http://dfki.de/sd4m-sta-demo/.

pdf bib abs
Generating Pattern-Based Entailment Graphs for Relation Extraction
Kathrin Eichler | Feiyu Xu | Hans Uszkoreit | Sebastian Krause
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Relation extraction is the task of recognizing and extracting relations between entities or concepts in texts. A common approach is to exploit existing knowledge to learn linguistic patterns expressing the target relation and use these patterns for extracting new relation mentions. Deriving relation patterns automatically usually results in large numbers of candidates, which need to be filtered to derive a subset of patterns that reliably extract correct relation mentions. We address the pattern selection task by exploiting the knowledge represented by entailment graphs, which capture semantic relationships holding among the learned pattern candidates. This is motivated by the fact that a pattern may not express the target relation explicitly, but still be useful for extracting instances for which the relation holds, because its meaning entails the meaning of the target relation. We evaluate the usage of both automatically generated and gold-standard entailment graphs in a relation extraction scenario and present favorable experimental results, exhibiting the benefits of structuring and selecting patterns based on entailment graphs.

2016

pdf bib
Event Linking with Sentential Features from Convolutional Neural Networks
Sebastian Krause | Feiyu Xu | Hans Uszkoreit | Dirk Weissenborn
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib abs
Relation- and Phrase-level Linking of FrameNet with Sar-graphs
Aleksandra Gabryszak | Sebastian Krause | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data. We describe our approach for combining two English resources, FrameNet and sar-graphs, and illustrate the benefits of the linked data in a relation extraction setting. While FrameNet consists of schematic representations of situations, linked to lexemes and their valency patterns, sar-graphs are knowledge resources that connect semantic relations from factual knowledge graphs to the linguistic phrases used to express instances of these relations. We analyze the conceptual similarities and differences of both resources and propose to link sar-graphs and FrameNet on the levels of relations/frames as well as phrases. The former alignment involves a manual ontology mapping step, which allows us to extend sar-graphs with new phrase patterns from FrameNet. The phrase-level linking, on the other hand, is fully automatic. We investigate the quality of the automatically constructed links and identify two main classes of errors.

pdf bib abs
TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns
Kathrin Eichler | Feiyu Xu | Hans Uszkoreit | Leonhard Hennig | Sebastian Krause
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The task of relation extraction is to recognize and extract relations between entities or concepts in texts. Dependency parse trees have become a popular source for discovering extraction patterns, which encode the grammatical relations among the phrases that jointly express relation instances. State-of-the-art weakly supervised approaches to relation extraction typically extract thousands of unique patterns only potentially expressing the target relation. Among these patterns, some are semantically equivalent, but differ in their morphological, lexical-semantic or syntactic form. Some express a relation that entails the target relation. We propose a new approach to structuring extraction patterns by utilizing entailment graphs, hierarchical structures representing entailment relations, and present a novel resource of gold-standard entailment graphs based on a set of patterns automatically acquired using distant supervision. We describe the methodology used for creating the dataset and present statistics of the resource as well as an analysis of inference types underlying the entailment decisions.

In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also presents a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance. The presented schema will be used to apply novel techniques from natural language processing and machine learning to support doctors treating their patients by improved information access from unstructured German texts.

pdf bib abs
Negation Detection in Clinical Reports Written in German
Viviana Cotik | Roland Roller | Feiyu Xu | Hans Uszkoreit | Klemens Budde | Danilo Schmidt
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

An important subtask in clinical text mining tries to identify whether a clinical finding is expressed as present, absent or unsure in a text. This work presents a system for detecting mentions of clinical findings that are negated or just speculated. The system has been applied to two different types of German clinical texts: clinical notes and discharge summaries. Our approach is built on top of NegEx, a well known algorithm for identifying non-factive mentions of medical findings. In this work, we adjust a previous adaptation of NegEx to German and evaluate the system on our data to detect negation and speculation. The results are compared to a baseline algorithm and are analyzed for both types of clinical documents. Our system achieves an F1-Score above 0.9 on both types of reports.

pdf bib
Deeper Machine Translation and Evaluation for German
Eleftherios Avramidis | Vivien Macketanz | Aljoscha Burchardt | Jindrich Helcl | Hans Uszkoreit
Proceedings of the 2nd Deep Machine Translation Workshop

2015

pdf bib
Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
Dirk Weissenborn | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
A Web-based Collaborative Evaluation Tool for Automatically Learned Relation Extraction Patterns
Leonhard Hennig | Hong Li | Sebastian Krause | Feiyu Xu | Hans Uszkoreit
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
DFKI: Multi-objective Optimization for the Joint Disambiguation of Entities and Nouns & Deep Verb Sense Disambiguation
Dirk Weissenborn | Feiyu Xu | Hans Uszkoreit
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with Language
Sebastian Krause | Leonhard Hennig | Aleksandra Gabryszak | Feiyu Xu | Hans Uszkoreit
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

pdf bib
Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations
Renlong Ai | Sebastian Krause | Walter Kasper | Feiyu Xu | Hans Uszkoreit
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

pdf bib
Towards Deeper MT - A Hybrid System for German
Eleftherios Avramidis | Aljoscha Burchardt | Maja Popović | Hans Uszkoreit
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
Using a new analytic measure for the annotation and analysis of MT errors on real data
Arle Lommel | Aljoscha Burchardt | Maja Popović | Kim Harris | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf bib
Relations between different types of post-editing operations, cognitive effort and temporal effort
Maja Popović | Arle Lommel | Aljoscha Burchardt | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf bib abs
Information Extraction from German Patient Records via Hybrid Parsing and Relation Extraction Strategies
Hans-Ulrich Krieger | Christian Spurk | Hans Uszkoreit | Feiyu Xu | Yi Zhang | Frank Müller | Thomas Tolxdorff
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we report on first attempts and findings to analyzing German patient records, using a hybrid parsing architecture and a combination of two relation extraction strategies. On a practical level, we are interested in the extraction of concepts and relations among those concepts, a necessary cornerstone for building medical information systems. The parsing pipeline consists of a morphological analyzer, a robust chunk parser adapted to Latin phrases used in medical diagnosis, a repair rule stage, and a probabilistic context-free parser that respects the output from the chunker. The relation extraction stage is a combination of two systems: SProUT, a shallow processor which uses hand-written rules to discover relation instances from local text units and DARE which extracts relation instances from complete sentences, using rules that are learned in a bootstrapping process, starting with semantic seeds. Two small experiments have been carried out for the parsing pipeline and the relation extraction stage.

This paper presents a new resource for the training and evaluation needed by relation extraction experiments. The corpus consists of annotations of mentions for three semantic relations: marriage, parent―child, siblings, selected from the domain of biographic facts about persons and their social relationships. The corpus contains more than one hundred news articles from Tabloid Press. In the current corpus, we only consider the relation mentions occurring in the individual sentences. We provide multi-level annotations which specify the marked facts from relation, argument, entity, down to the token level, thus allowing for detailed analysis of linguistic phenomena and their interactions. A generic markup tool Recon developed at the DFKI LT lab has been utilised for the annotation task. The corpus has been annotated by two human experts, supported by additional conflict resolution conducted by a third expert. As shown in the evaluation, the annotation is of high quality as proved by the stated inter-annotator agreements both on sentence level and on relationmention level. The current corpus is already in active use in our research for evaluation of the relation extraction performance of our automatically learned extraction patterns.

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiatives work throughout Europe in order to boost progress and innovation in our field.

Modern language learning courses are no longer exclusively based on books or face-to-face lectures. More and more lessons make use of multimedia and personalized learning methods. Many of these are based on e-learning solutions. Learning via the Internet provides 7/24 services that require sizeable human resources. Therefore we witness a growing economic pressure to employ computer-assisted methods for improving language learning in quality, efficiency and scalability. In this paper, we will address three applications of language technologies for language learning: 1) Methods and strategies for pronunciation training in second language learning, e.g., multimodal feedback via visualization of sound features, speech verification and prosody transplantation; 2) Dialogue-based language learning games; 3) Application of parsing and generation technologies to the automatic generation of paraphrases for the semi-automatic production of learning material.

pdf bib abs
Language Resources and Annotation Tools for Cross-Sentence Relation Extraction
Sebastian Krause | Hong Li | Feiyu Xu | Hans Uszkoreit | Robert Hummel | Luise Spielhagen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present a novel combination of two types of language resources dedicated to the detection of relevant relations (RE) such as events or facts across sentence boundaries. One of the two resources is the sar-graph, which aggregates for each target relation ten thousands of linguistic patterns of semantically associated relations that signal instances of the target relation (Uszkoreit and Xu, 2013). These have been learned from the Web by intra-sentence pattern extraction (Krause et al., 2012) and after semantic filtering and enriching have been automatically combined into a single graph. The other resource is cockrACE, a specially annotated corpus for the training and evaluation of cross-sentence RE. By employing our powerful annotation tool Recon, annotators mark selected entities and relations (including events), coreference relations among these entities and events, and also terms that are semantically related to the relevant relations and events. This paper describes how the two resources are created and how they complement each other.

pdf bib
Analytical Approaches to Combining MT Technologies
Hans Uszkoreit
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

2013

pdf bib
What can we learn about the selection mechanism for post-editing?
Maja Popović | Eleftherios Avramidis | Aljoscha Burchardt | David Vilar | Hans Uszkoreit
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

pdf bib
Multidimensional quality metrics: a flexible system for assessing translation quality
Arle Richard Lommel | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of Translating and the Computer 35

2012

pdf bib
Quality Translation for a Multilingual Continent - Priorities and Chances for European MT Research
Hans Uszkoreit
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 4: Invited Conferences

pdf bib abs
Evaluation of the KomParse Conversational Non-Player Characters in a Commercial Virtual World
Tina Kluewer | Feiyu Xu | Peter Adolphs | Hans Uszkoreit
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper describes the evaluation of the KomParse system. KomParse is a dialogue system embedded in a 3-D massive multiplayer online game, allowing conversations between non player characters (NPCs) and game users. In a field test with game users, the system was evaluated with respect to acceptability and usability of the overall system as well as task completion, dialogue control and efficiency of three conversational tasks. Furthermore, subjective feedback has been collected for evaluating the single communication components of the system such as natural language understanding. The results are very satisfying and promising. In general, both the usability and acceptability tests show that the tested NPC is useful and well-accepted by the users. Even if the NPC does not always understand the users well and expresses things unexpected, he could still provide appropriate responses to help users to solve their problems or entertain them.

pdf bib
Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
Weiwei Sun | Hans Uszkoreit
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

bib
Strategic MT Research in Europe: Themes, Approaches, Results and Plans
Hans Uszkoreit
Proceedings of Machine Translation Summit XIII: Plenaries

pdf bib
Minimally Supervised Rule Learning for the Extraction of Biographic Information from Various Social Domains
Hong Li | Feiyu Xu | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
META-DARE: Monitoring the Minimally Supervised ML of Relation Extraction Rules
Hong Li | Feiyu Xu | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
TechWatchTool: Innovation and Trend Monitoring
Hong Li | Feiyu Xu | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
DFKI Hybrid Machine Translation System for WMT 2011 - On the Integration of SMT and RBMT
Jia Xu | Hans Uszkoreit | Casey Kennington | David Vilar | Xiaojun Zhang
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Minimally Supervised Domain-Adaptive Parse Reranking for Relation Extraction
Feiyu Xu | Hong Li | Yi Zhang | Hans Uszkoreit | Sebastian Krause
Proceedings of the 12th International Conference on Parsing Technologies

2010

pdf bib
Using Syntactic and Semantic based Relations for Dialogue Act Recognition
Tina Klüwer | Hans Uszkoreit | Feiyu Xu
Coling 2010: Posters

pdf bib
Boosting Relation Extraction with Limited Closed-World Knowledge
Feiyu Xu | Hans Uszkoreit | Sebastian Krause | Hong Li
Coling 2010: Posters

pdf bib abs
Question Answering Biographic Information and Social Network Powered by the Semantic Web
Peter Adolphs | Xiwen Cheng | Tina Klüwer | Hans Uszkoreit | Feiyu Xu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

After several years of development, the vision of the Semantic Web is gradually becoming reality. Large data repositories have been created and offer semantic information in a machine-processable form for various domains. Semantic Web data can be published on the Web, gathered automatically, and reasoned about. All these developments open interesting perspectives for building a new class of domain-specific, broad-coverage information systems that overcome a long-standing bottleneck of AI systems, the notoriously incomplete knowledge base. We present a system that shows how the wealth of information in the Semantic Web can be interfaced with humans once again, using natural language for querying and answering rather than technical formalisms. Whereas current Question Answering systems typically select snippets from Web documents retrieved by a search engine, we utilize Semantic Web data, which allows us to provide natural-language answers that are tailored to the current dialog context. Furthermore, we show how to use natural language processing technologies to acquire new data and enrich existing data in a Semantic Web framework. Our system has acquired a rich biographic data resource by combining existing Semantic Web resources, which are discovered from semi-structured textual data in Web pages, with information extracted from free natural language texts.

pdf bib abs
LT World: Ontology and Reference Information Portal
Brigitte Jörg | Hans Uszkoreit | Alastair Burt
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

LT World (www.lt-world.org) is an ontology-driven web portal aimed at serving the global language technology community. Ontology-driven means, that the system is driven by an ontological schema to manage the research information and knowledge life-cycles: identify relevant concepts of information, structure and formalize them, assign relationships, functions and views, add states and rules, modify them. For modelling such a complex structure, we employ (i) concepts from the research domain, such as person, organisation, project, tool, data, patent, news, event (ii) concepts from the LT domain, such as technology and resource (iii) concepts from closely related domains, such as language, linguistics, and mathematics. Whereas the research entities represent the general context, that is, a research environment as such, the LT entities define the information and knowledge space of the field, enhanced by entities from closely related areas. By managing information holistically ― that is, within a research context ― its inherent semantics becomes much more transparent. This paper introduces LT World as a reference information portal through ontological eyes: its content, its system, its method for maintaining knowledge-rich items, its ontology as an asset.

pdf bib abs
Determining the Origin and Structure of Person Names
Yu Fu | Feiyu Xu | Hans Uszkoreit
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly strong evidence for predicting the language origin of person names. Our approach is context-, language- and domain-independent and can thus be easily adapted to person names in or from other languages. Furthermore, we provide a novel strategy to handle origin ambiguities or multiple origins in a name. HENNA also provides a person name parser for the analysis of linguistic and knowledge structures of person names. All the knowledge about a person name in HENNA is modelled in a person-name ontology, including relationships between language origins, linguistic features and grammars of person names of a specific language and interpretation of name elements. The approaches presented here are useful extensions of the named entity recognition task.

pdf bib
Talking NPCs in a Virtual Game World
Tina Klüwer | Peter Adolphs | Feiyu Xu | Hans Uszkoreit | Xiwen Cheng
Proceedings of the ACL 2010 System Demonstrations

2009

pdf bib
Gossip Galore – A Self-Learning Agent for Exchanging Pop Trivia
Xiwen Cheng | Peter Adolphs | Feiyu Xu | Hans Uszkoreit | Hong Li
Proceedings of the Demonstrations Session at EACL 2009

pdf bib
Linguistics in Computational Linguistics: Observations and Predictions
Hans Uszkoreit
Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?

2008

pdf bib
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)
Donia Scott | Hans Uszkoreit
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Coling 2008: Companion volume: Posters
Donia Scott | Hans Uszkoreit
Coling 2008: Companion volume: Posters

pdf bib abs
Extracting and Querying Relations in Scientific Papers on Language Technology
Ulrich Schäfer | Hans Uszkoreit | Christian Federmann | Torsten Marek | Yajing Zhang
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe methods for extracting interesting factual relations from scientific texts in computational linguistics and language technology taken from the ACL Anthology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting quriples are stored in a database from where they can be retrieved (again using abstraction methods) by relation-based search. The query interface is embedded in a web browser-based application we call the Scientists Workbench. It supports researchers in editing and online-searching scientific papers.

pdf bib abs
Adaptation of Relation Extraction Rules to New Domains
Feiyu Xu | Hans Uszkoreit | Hong Li | Niko Felger
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents various strategies for improving the extraction performance of less prominent relations with the help of the rules learned for similar relations, for which large volumes of data are available that exhibit suitable data properties. The rules are learned via a minimally supervised machine learning system for relation extraction called DARE. Starting from semantic seeds, DARE extracts linguistic grammar rules associated with semantic roles from parsed news texts. The performance analysis with respect to different experiment domains shows that the data property plays an important role for DARE. Especially the redundancy of the data and the connectivity of instances and pattern rules have a strong influence on recall. However, most real-world data sets do not possess the desirable small-world property. Therefore, we propose three scenarios to overcome the data property problem of some domains by exploiting a similar domain with better data properties. The first two strategies stay with the same corpus but try to extract new similar relations with learned rules. The third strategy adapts the learned rules to a new corpus. All three strategies show that frequently mentioned relations can help in the detection of less frequent relations.

pdf bib
Hybrid Learning of Dependency Structures from Heterogeneous Linguistic Resources
Yi Zhang | Rui Wang | Hans Uszkoreit
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

2007

pdf bib
A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity
Feiyu Xu | Hans Uszkoreit | Hong Li
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib abs
The pragmatic combination of different crosslingual resources
Hans Uszkoreit | Feiyu Xu | Jörg Steffen | Ilhan Aslan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We will describe new cross-lingual strategies for the development multilingual information services on mobile devices. The novelty of our approach is the intelligent modeling of cross-lingual application domains and the combination of textual translation with speech generation. The final system helps users to speak foreign languages and communicate with the local people in relevant situations, such as restaurant, taxi and emergencies. The advantage of our information services is that they are robust enough for the use in real-world situations. They are developed for the Beijing Olympic Games 2008, where most foreigners will have to rely on translation assistance. Their deployment is foreseen as part of the planned ubiquitous mobile information system of the Olympic Games.

pdf bib
Chinese Named Entity and Relation Identification System
Tianfang Yao | Hans Uszkoreit
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Contextual phenomena and thematic relations in database QA dialogues: results from a Wizard-of-Oz Experiment
Núria Bertomeu | Hans Uszkoreit | Anette Frank | Hans-Ulrich Krieger | Brigitte Jörg
Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006

2005

pdf bib abs
Ontologies for Crosslingual Applications
Hans Uszkoreit
Workshop on Semantic Web technologies for machine translation

Human translation is based on linguistic and extralinguistic knowledge. Despite promising pioneering advances, knowledge-based machine translation has remained a tempting vision. The bottleneck has been the engineering of sufficiently comprehensive bodies of relevant knowledge The Semantic Web offers opportunities for the gradual evolution of a global heterogeneous knowledge base. The immediate target has been the modelling of certain knowledge domains by practical ontologies. In the talk we will demonstrate the utilization of ontological knowledge indifferent crosslingual applications reaching from crosslingual document retrieval via crosslingual question answering to complex information services involving several crosslingual functionalities, including machine translation. We will then discuss the ramifications of this development and of the evolution of the World Wide Web for future directions in both statistical and rule-based machine translation.

pdf bib
Language Technology from a European Perspective
Hans Uszkoreit | Valia Kordoni | Vladislav Kubon | Michael Rosner | Sabine Kirchmeier-Andersen
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
A Novel Machine Learning Approach for the Identification of Named Entity Relations
Tianfang Yao | Hans Uszkoreit
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing