Feiyu Xu

2020

Unsupervised text style transfer is full of challenges due to the lack of parallel data and difficulties in content preservation. In this paper, we propose a novel neural approach to unsupervised text style transfer which we refer to as Cycle-consistent Adversarial autoEncoders (CAE) trained from non-parallel data. CAE consists of three essential components: (1) LSTM autoencoders that encode a text in one style into its latent representation and decode an encoded representation into its original text or a transferred representation into a style-transferred text, (2) adversarial style transfer networks that use an adversarially trained generator to transform a latent representation in one style into a representation in another style, and (3) a cycle-consistent constraint that enhances the capacity of the adversarial style transfer networks in content preservation. The entire CAE with these three components can be trained end-to-end. Extensive experiments and in-depth analyses on two widely-used public datasets consistently validate the effectiveness of proposed CAE in both style transfer and content preservation against several strong baselines in terms of four automatic evaluation metrics and human evaluation.

2017

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

pdf bib abs
Annotation of Entities and Relations in Spanish Radiology Reports
Viviana Cotik | Darío Filippo | Roland Roller | Hans Uszkoreit | Feiyu Xu
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Radiology reports express the results of a radiology study and contain information about anatomical entities, findings, measures and impressions of the medical doctor. The use of information extraction techniques can help physicians to access this information in order to understand data and to infer further knowledge. Supervised machine learning methods are very popular to address information extraction, but are usually domain and language dependent. To train new classification models, annotated data is required. Moreover, annotated data is also required as an evaluation resource of information extraction algorithms. However, one major drawback of processing clinical data is the low availability of annotated datasets. For this reason we performed a manual annotation of radiology reports written in Spanish. This paper presents the corpus, the annotation schema, the annotation guidelines and further insight of the data.

A huge body of continuously growing written knowledge is available on the web in the form of social media posts, RSS feeds, and news articles. Real-time information extraction from such high velocity, high volume text streams requires scalable, distributed natural language processing pipelines. We introduce such a system for fine-grained event recognition within the big data framework Flink, and demonstrate its capabilities for extracting and geo-locating mobility- and industry-related events from heterogeneous text sources. Performance analyses conducted on several large datasets show that our system achieves high throughput and maintains low latency, which is crucial when events need to be detected and acted upon in real-time. We also present promising experimental results for the event extraction component of our system, which recognizes a novel set of event types. The demo system is available at http://dfki.de/sd4m-sta-demo/.

pdf bib abs
Generating Pattern-Based Entailment Graphs for Relation Extraction
Kathrin Eichler | Feiyu Xu | Hans Uszkoreit | Sebastian Krause
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Relation extraction is the task of recognizing and extracting relations between entities or concepts in texts. A common approach is to exploit existing knowledge to learn linguistic patterns expressing the target relation and use these patterns for extracting new relation mentions. Deriving relation patterns automatically usually results in large numbers of candidates, which need to be filtered to derive a subset of patterns that reliably extract correct relation mentions. We address the pattern selection task by exploiting the knowledge represented by entailment graphs, which capture semantic relationships holding among the learned pattern candidates. This is motivated by the fact that a pattern may not express the target relation explicitly, but still be useful for extracting instances for which the relation holds, because its meaning entails the meaning of the target relation. We evaluate the usage of both automatically generated and gold-standard entailment graphs in a relation extraction scenario and present favorable experimental results, exhibiting the benefits of structuring and selecting patterns based on entailment graphs.

2016

pdf bib
Event Linking with Sentential Features from Convolutional Neural Networks
Sebastian Krause | Feiyu Xu | Hans Uszkoreit | Dirk Weissenborn
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib abs
Relation- and Phrase-level Linking of FrameNet with Sar-graphs
Aleksandra Gabryszak | Sebastian Krause | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data. We describe our approach for combining two English resources, FrameNet and sar-graphs, and illustrate the benefits of the linked data in a relation extraction setting. While FrameNet consists of schematic representations of situations, linked to lexemes and their valency patterns, sar-graphs are knowledge resources that connect semantic relations from factual knowledge graphs to the linguistic phrases used to express instances of these relations. We analyze the conceptual similarities and differences of both resources and propose to link sar-graphs and FrameNet on the levels of relations/frames as well as phrases. The former alignment involves a manual ontology mapping step, which allows us to extend sar-graphs with new phrase patterns from FrameNet. The phrase-level linking, on the other hand, is fully automatic. We investigate the quality of the automatically constructed links and identify two main classes of errors.

pdf bib abs
TEG-REP: A corpus of Textual Entailment Graphs based on Relation Extraction Patterns
Kathrin Eichler | Feiyu Xu | Hans Uszkoreit | Leonhard Hennig | Sebastian Krause
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The task of relation extraction is to recognize and extract relations between entities or concepts in texts. Dependency parse trees have become a popular source for discovering extraction patterns, which encode the grammatical relations among the phrases that jointly express relation instances. State-of-the-art weakly supervised approaches to relation extraction typically extract thousands of unique patterns only potentially expressing the target relation. Among these patterns, some are semantically equivalent, but differ in their morphological, lexical-semantic or syntactic form. Some express a relation that entails the target relation. We propose a new approach to structuring extraction patterns by utilizing entailment graphs, hierarchical structures representing entailment relations, and present a novel resource of gold-standard entailment graphs based on a set of patterns automatically acquired using distant supervision. We describe the methodology used for creating the dataset and present statistics of the resource as well as an analysis of inference types underlying the entailment decisions.

In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also presents a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance. The presented schema will be used to apply novel techniques from natural language processing and machine learning to support doctors treating their patients by improved information access from unstructured German texts.

pdf bib abs
Negation Detection in Clinical Reports Written in German
Viviana Cotik | Roland Roller | Feiyu Xu | Hans Uszkoreit | Klemens Budde | Danilo Schmidt
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

An important subtask in clinical text mining tries to identify whether a clinical finding is expressed as present, absent or unsure in a text. This work presents a system for detecting mentions of clinical findings that are negated or just speculated. The system has been applied to two different types of German clinical texts: clinical notes and discharge summaries. Our approach is built on top of NegEx, a well known algorithm for identifying non-factive mentions of medical findings. In this work, we adjust a previous adaptation of NegEx to German and evaluate the system on our data to detect negation and speculation. The results are compared to a baseline algorithm and are analyzed for both types of clinical documents. Our system achieves an F1-Score above 0.9 on both types of reports.

2015

pdf bib
Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
Dirk Weissenborn | Leonhard Hennig | Feiyu Xu | Hans Uszkoreit
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
A System Demonstration of a Framework for Computer Assisted Pronunciation Training
Renlong Ai | Feiyu Xu
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
A Web-based Collaborative Evaluation Tool for Automatically Learned Relation Extraction Patterns
Leonhard Hennig | Hong Li | Sebastian Krause | Feiyu Xu | Hans Uszkoreit
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
DFKI: Multi-objective Optimization for the Joint Disambiguation of Entities and Nouns & Deep Verb Sense Disambiguation
Dirk Weissenborn | Feiyu Xu | Hans Uszkoreit
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with Language
Sebastian Krause | Leonhard Hennig | Aleksandra Gabryszak | Feiyu Xu | Hans Uszkoreit
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

pdf bib
Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations
Renlong Ai | Sebastian Krause | Walter Kasper | Feiyu Xu | Hans Uszkoreit
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2014

pdf bib abs
Information Extraction from German Patient Records via Hybrid Parsing and Relation Extraction Strategies
Hans-Ulrich Krieger | Christian Spurk | Hans Uszkoreit | Feiyu Xu | Yi Zhang | Frank Müller | Thomas Tolxdorff
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we report on first attempts and findings to analyzing German patient records, using a hybrid parsing architecture and a combination of two relation extraction strategies. On a practical level, we are interested in the extraction of concepts and relations among those concepts, a necessary cornerstone for building medical information systems. The parsing pipeline consists of a morphological analyzer, a robust chunk parser adapted to Latin phrases used in medical diagnosis, a repair rule stage, and a probabilistic context-free parser that respects the output from the chunker. The relation extraction stage is a combination of two systems: SProUT, a shallow processor which uses hand-written rules to discover relation instances from local text units and DARE which extracts relation instances from complete sentences, using rules that are learned in a bootstrapping process, starting with semantic seeds. Two small experiments have been carried out for the parsing pipeline and the relation extraction stage.

This paper presents a new resource for the training and evaluation needed by relation extraction experiments. The corpus consists of annotations of mentions for three semantic relations: marriage, parent―child, siblings, selected from the domain of biographic facts about persons and their social relationships. The corpus contains more than one hundred news articles from Tabloid Press. In the current corpus, we only consider the relation mentions occurring in the individual sentences. We provide multi-level annotations which specify the marked facts from relation, argument, entity, down to the token level, thus allowing for detailed analysis of linguistic phenomena and their interactions. A generic markup tool Recon developed at the DFKI LT lab has been utilised for the annotation task. The corpus has been annotated by two human experts, supported by additional conflict resolution conducted by a third expert. As shown in the evaluation, the annotation is of high quality as proved by the stated inter-annotator agreements both on sentence level and on relationmention level. The current corpus is already in active use in our research for evaluation of the relation extraction performance of our automatically learned extraction patterns.

Modern language learning courses are no longer exclusively based on books or face-to-face lectures. More and more lessons make use of multimedia and personalized learning methods. Many of these are based on e-learning solutions. Learning via the Internet provides 7/24 services that require sizeable human resources. Therefore we witness a growing economic pressure to employ computer-assisted methods for improving language learning in quality, efficiency and scalability. In this paper, we will address three applications of language technologies for language learning: 1) Methods and strategies for pronunciation training in second language learning, e.g., multimodal feedback via visualization of sound features, speech verification and prosody transplantation; 2) Dialogue-based language learning games; 3) Application of parsing and generation technologies to the automatic generation of paraphrases for the semi-automatic production of learning material.

pdf bib abs
Language Resources and Annotation Tools for Cross-Sentence Relation Extraction
Sebastian Krause | Hong Li | Feiyu Xu | Hans Uszkoreit | Robert Hummel | Luise Spielhagen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present a novel combination of two types of language resources dedicated to the detection of relevant relations (RE) such as events or facts across sentence boundaries. One of the two resources is the sar-graph, which aggregates for each target relation ten thousands of linguistic patterns of semantically associated relations that signal instances of the target relation (Uszkoreit and Xu, 2013). These have been learned from the Web by intra-sentence pattern extraction (Krause et al., 2012) and after semantic filtering and enriching have been automatically combined into a single graph. The other resource is cockrACE, a specially annotated corpus for the training and evaluation of cross-sentence RE. By employing our powerful annotation tool Recon, annotators mark selected entities and relations (including events), coreference relations among these entities and events, and also terms that are semantically related to the relevant relations and events. This paper describes how the two resources are created and how they complement each other.

2013

pdf bib
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop
Anik Dey | Sebastian Krause | Ivelina Nikolova | Eva Vecchi | Steven Bethard | Preslav I. Nakov | Feiyu Xu
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

2012

pdf bib abs
Annotating Opinions in German Political News
Hong Li | Xiwen Cheng | Kristina Adson | Tal Kirshboim | Feiyu Xu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents an approach to construction of an annotated corpus for German political news for the opinion mining task. The annotated corpus has been applied to learn relation extraction rules for extraction of opinion holders, opinion content and classification of polarities. An adapted annotated schema has been developed on top of the state-of-the-art research. Furthermore, a general tool for annotating relations has been utilized for the annotation task. An evaluation of the inter-annotator agreement has been conducted. The rule learning is realized with the help of a minimally supervised machine learning framework DARE.

pdf bib abs
Evaluation of the KomParse Conversational Non-Player Characters in a Commercial Virtual World
Tina Kluewer | Feiyu Xu | Peter Adolphs | Hans Uszkoreit
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper describes the evaluation of the KomParse system. KomParse is a dialogue system embedded in a 3-D massive multiplayer online game, allowing conversations between non player characters (NPCs) and game users. In a field test with game users, the system was evaluated with respect to acceptability and usability of the overall system as well as task completion, dialogue control and efficiency of three conversational tasks. Furthermore, subjective feedback has been collected for evaluating the single communication components of the system such as natural language understanding. The results are very satisfying and promising. In general, both the usability and acceptability tests show that the tested NPC is useful and well-accepted by the users. Even if the NPC does not always understand the users well and expresses things unexpected, he could still provide appropriate responses to help users to solve their problems or entertain them.

2011

pdf bib
Minimally Supervised Rule Learning for the Extraction of Biographic Information from Various Social Domains
Hong Li | Feiyu Xu | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
META-DARE: Monitoring the Minimally Supervised ML of Relation Extraction Rules
Hong Li | Feiyu Xu | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
TechWatchTool: Innovation and Trend Monitoring
Hong Li | Feiyu Xu | Hans Uszkoreit
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Minimally Supervised Domain-Adaptive Parse Reranking for Relation Extraction
Feiyu Xu | Hong Li | Yi Zhang | Hans Uszkoreit | Sebastian Krause
Proceedings of the 12th International Conference on Parsing Technologies

2010

pdf bib
Using Syntactic and Semantic based Relations for Dialogue Act Recognition
Tina Klüwer | Hans Uszkoreit | Feiyu Xu
Coling 2010: Posters

pdf bib
Boosting Relation Extraction with Limited Closed-World Knowledge
Feiyu Xu | Hans Uszkoreit | Sebastian Krause | Hong Li
Coling 2010: Posters

pdf bib abs
Question Answering Biographic Information and Social Network Powered by the Semantic Web
Peter Adolphs | Xiwen Cheng | Tina Klüwer | Hans Uszkoreit | Feiyu Xu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

After several years of development, the vision of the Semantic Web is gradually becoming reality. Large data repositories have been created and offer semantic information in a machine-processable form for various domains. Semantic Web data can be published on the Web, gathered automatically, and reasoned about. All these developments open interesting perspectives for building a new class of domain-specific, broad-coverage information systems that overcome a long-standing bottleneck of AI systems, the notoriously incomplete knowledge base. We present a system that shows how the wealth of information in the Semantic Web can be interfaced with humans once again, using natural language for querying and answering rather than technical formalisms. Whereas current Question Answering systems typically select snippets from Web documents retrieved by a search engine, we utilize Semantic Web data, which allows us to provide natural-language answers that are tailored to the current dialog context. Furthermore, we show how to use natural language processing technologies to acquire new data and enrich existing data in a Semantic Web framework. Our system has acquired a rich biographic data resource by combining existing Semantic Web resources, which are discovered from semi-structured textual data in Web pages, with information extracted from free natural language texts.

pdf bib abs
Determining the Origin and Structure of Person Names
Yu Fu | Feiyu Xu | Hans Uszkoreit
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly strong evidence for predicting the language origin of person names. Our approach is context-, language- and domain-independent and can thus be easily adapted to person names in or from other languages. Furthermore, we provide a novel strategy to handle origin ambiguities or multiple origins in a name. HENNA also provides a person name parser for the analysis of linguistic and knowledge structures of person names. All the knowledge about a person name in HENNA is modelled in a person-name ontology, including relationships between language origins, linguistic features and grammars of person names of a specific language and interpretation of name elements. The approaches presented here are useful extensions of the named entity recognition task.

pdf bib
Talking NPCs in a Virtual Game World
Tina Klüwer | Peter Adolphs | Feiyu Xu | Hans Uszkoreit | Xiwen Cheng
Proceedings of the ACL 2010 System Demonstrations

2009

pdf bib
Gossip Galore – A Self-Learning Agent for Exchanging Pop Trivia
Xiwen Cheng | Peter Adolphs | Feiyu Xu | Hans Uszkoreit | Hong Li
Proceedings of the Demonstrations Session at EACL 2009

2008

pdf bib abs
Adaptation of Relation Extraction Rules to New Domains
Feiyu Xu | Hans Uszkoreit | Hong Li | Niko Felger
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents various strategies for improving the extraction performance of less prominent relations with the help of the rules learned for similar relations, for which large volumes of data are available that exhibit suitable data properties. The rules are learned via a minimally supervised machine learning system for relation extraction called DARE. Starting from semantic seeds, DARE extracts linguistic grammar rules associated with semantic roles from parsed news texts. The performance analysis with respect to different experiment domains shows that the data property plays an important role for DARE. Especially the redundancy of the data and the connectivity of instances and pattern rules have a strong influence on recall. However, most real-world data sets do not possess the desirable small-world property. Therefore, we propose three scenarios to overcome the data property problem of some domains by exploiting a similar domain with better data properties. The first two strategies stay with the same corpus but try to extract new similar relations with learned rules. The third strategy adapts the learned rules to a new corpus. All three strategies show that frequently mentioned relations can help in the detection of less frequent relations.

pdf bib abs
Fine-grained Opinion Topic and Polarity Identification
Xiwen Cheng | Feiyu Xu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents OMINE, an opinion mining system which aims to identify concepts such as products and their attributes, and analyze their corresponding polarities. Our work pioneers at linking extracted topic terms with domain-specific concepts. Compared with previous work, taking advantage of ontological techniques, OMINE achieves 10% higher recall with the same level precision on the topic extraction task. In addition, making use of opinion patterns for sentiment analysis, OMINE improves the performance of the backup system (NGram) around 6% for positive reviews and 8% for negative ones.

2007

pdf bib
A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity
Feiyu Xu | Hans Uszkoreit | Hong Li
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib abs
The pragmatic combination of different crosslingual resources
Hans Uszkoreit | Feiyu Xu | Jörg Steffen | Ilhan Aslan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We will describe new cross-lingual strategies for the development multilingual information services on mobile devices. The novelty of our approach is the intelligent modeling of cross-lingual application domains and the combination of textual translation with speech generation. The final system helps users to speak foreign languages and communicate with the local people in relevant situations, such as restaurant, taxi and emergencies. The advantage of our information services is that they are robust enough for the use in real-world situations. They are developed for the Beijing Olympic Games 2008, where most foreigners will have to rely on translation assistance. Their deployment is foreseen as part of the planned ubiquitous mobile information system of the Olympic Games.