Daniel Gildea

2022

pdf abs
Sequence-to-sequence AMR Parsing with Ancestor Information
Chen Yu | Daniel Gildea
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

AMR parsing is the task that maps a sentence to an AMR semantic graph automatically. The difficulty comes from generating the complex graph structure. The previous state-of-the-art method translates the AMR graph into a sequence, then directly fine-tunes a pretrained sequence-to-sequence Transformer model (BART). However, purely treating the graph as a sequence does not take advantage of structural information about the graph. In this paper, we design several strategies to add the important ancestor information into the Transformer Decoder. Our experiments show that we can improve the performance for both AMR 2.0 and AMR 3.0 dataset and achieve new state-of-the-art results.

pdf abs
Rewarding Semantic Similarity under Optimized Alignments for AMR-to-Text Generation
Lisa Jin | Daniel Gildea
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A common way to combat exposure bias is by applying scores from evaluation metrics as rewards in reinforcement learning (RL). Metrics leveraging contextualized embeddings appear more flexible than their n-gram matching counterparts and thus ideal as training rewards. However, metrics such as BERTScore greedily align candidate and reference tokens, which can allow system outputs to receive excess credit relative to a reference. Furthermore, past approaches featuring semantic similarity rewards suffer from repetitive outputs and overfitting. We address these issues by proposing metrics that replace the greedy alignments in BERTScore with optimized ones. We compute them on a model’s trained token embeddings to prevent domain mismatch. Our model optimizing discrete alignment metrics consistently outperforms cross-entropy and BLEU reward baselines on AMR-to-text generation. In addition, we find that this approach enjoys stable training compared to a non-RL setting.

2021

pdf abs
Outside Computation with Superior Functions
Parker Riley | Daniel Gildea
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We show that a general algorithm for efficient computation of outside values under the minimum of superior functions framework proposed by Knuth (1977) would yield a sub-exponential time algorithm for SAT, violating the Strong Exponential Time Hypothesis (SETH).

2020

pdf abs
Tensors over Semirings for Latent-Variable Weighted Logic Programs
Esma Balkir | Daniel Gildea | Shay B. Cohen
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

Semiring parsing is an elegant framework for describing parsers by using semiring weighted logic programs. In this paper we present a generalization of this concept: latent-variable semiring parsing. With our framework, any semiring weighted logic program can be latentified by transforming weights from scalar values of a semiring to rank-n arrays, or tensors, of semiring values, allowing the modelling of latent-variable models within the semiring parsing framework. Semiring is too strong a notion when dealing with tensors, and we have to resort to a weaker structure: a partial semiring. We prove that this generalization preserves all the desired properties of the original semiring framework while strictly increasing its expressiveness.

pdf bib abs
Efficient Outside Computation
Daniel Gildea
Computational Linguistics, Volume 46, Issue 4 - December 2020

Weighted deduction systems provide a framework for describing parsing algorithms that can be used with a variety of operations for combining the values of partial derivations. For some operations, inside values can be computed efficiently, but outside values cannot. We view out-side values as functions from inside values to the total value of all derivations, and we analyze outside computation in terms of function composition. This viewpoint helps explain why efficient outside computation is possible in many settings, despite the lack of a general outside algorithm for semiring operations.

pdf abs
Generalized Shortest-Paths Encoders for AMR-to-Text Generation
Lisa Jin | Daniel Gildea
Proceedings of the 28th International Conference on Computational Linguistics

For text generation from semantic graphs, past neural models encoded input structure via gated convolutions along graph edges. Although these operations provide local context, the distance messages can travel is bounded by the number of encoder propagation steps. We adopt recent efforts of applying Transformer self-attention to graphs to allow global feature propagation. Instead of feeding shortest paths to the vertex self-attention module, we train a model to learn them using generalized shortest-paths algorithms. This approach widens the receptive field of a graph encoder by exposing it to all possible graph paths. We explore how this path diversity affects performance across levels of AMR connectivity, demonstrating gains on AMRs of higher reentrancy counts and diameters. Analysis of generated sentences also supports high semantic coherence of our models for reentrant AMRs. Our best model achieves a 1.4 BLEU and 1.8 chrF++ margin over a baseline that encodes only pairwise-unique shortest paths.

2019

pdf bib abs
Semantic Neural Machine Translation Using AMR
Linfeng Song | Daniel Gildea | Yue Zhang | Zhiguo Wang | Jinsong Su
Transactions of the Association for Computational Linguistics, Volume 7

It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR (abstract meaning representation) on NMT. Experiments on a standard English-to-German dataset show that incorporating AMR as additional knowledge can significantly improve a strong attention-based sequence-to-sequence neural translation model.

pdf abs
Leveraging Dependency Forest for Neural Medical Relation Extraction
Linfeng Song | Yue Zhang | Daniel Gildea | Mo Yu | Zhiguo Wang | Jinsong Su
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Medical relation extraction discovers relations between entity mentions in text, such as research articles. For this task, dependency syntax has been recognized as a crucial source of features. Yet in the medical domain, 1-best parse trees suffer from relatively low accuracies, diminishing their usefulness. We investigate a method to alleviate this problem by utilizing dependency forests. Forests contain more than one possible decisions and therefore have higher recall but more noise compared with 1-best outputs. A graph neural network is used to represent the forests, automatically distinguishing the useful syntactic information from parsing noise. Results on two benchmarks show that our method outperforms the standard tree-based methods, giving the state-of-the-art results in the literature.

pdf abs
Ordered Tree Decomposition for HRG Rule Extraction
Daniel Gildea | Giorgio Satta | Xiaochang Peng
Computational Linguistics, Volume 45, Issue 2 - June 2019

We present algorithms for extracting Hyperedge Replacement Grammar (HRG) rules from a graph along with a vertex order. Our algorithms are based on finding a tree decomposition of smallest width, relative to the vertex order, and then extracting one rule for each node in this structure. The assumption of a fixed order for the vertices of the input graph makes it possible to solve the problem in polynomial time, in contrast to the fact that the problem of finding optimal tree decompositions for a graph is NP-hard. We also present polynomial-time algorithms for parsing based on our HRGs, where the input is a vertex sequence and the output is a graph structure. The intended application of our algorithms is grammar extraction and parsing for semantic representation of natural language. We apply our algorithms to data annotated with Abstract Meaning Representations and report on the characteristics of the resulting grammars.

pdf abs
SemBleu: A Robust Metric for AMR Parsing Evaluation
Linfeng Song | Daniel Gildea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Evaluating AMR parsing accuracy involves comparing pairs of AMR graphs. The major evaluation metric, SMATCH (Cai and Knight, 2013), searches for one-to-one mappings between the nodes of two AMRs with a greedy hill-climbing algorithm, which leads to search errors. We propose SEMBLEU, a robust metric that extends BLEU (Papineni et al., 2002) to AMRs. It does not suffer from search errors and considers non-local correspondences in addition to local ones. SEMBLEU is fully content-driven and punishes situations where a system’s output does not preserve most information from the input. Preliminary experiments on both sentence and corpus levels show that SEMBLEU has slightly higher consistency with human judgments than SMATCH. Our code is available at http://github.com/freesunshine0316/sembleu.

2018

pdf abs
The ACL Anthology: Current State and Future Directions
Daniel Gildea | Min-Yen Kan | Nitin Madnani | Christoph Teichmann | Martín Villalba
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

The Association of Computational Linguistic’s Anthology is the open source archive, and the main source for computational linguistics and natural language processing’s scientific literature. The ACL Anthology is currently maintained exclusively by community volunteers and has to be available and up-to-date at all times. We first discuss the current, open source approach used to achieve this, and then discuss how the planned use of Docker images will improve the Anthology’s long-term stability. This change will make it easier for researchers to utilize Anthology data for experimentation. We believe the ACL community can directly benefit from the extension-friendly architecture of the Anthology. We end by issuing an open challenge of reviewer matching we encourage the community to rally towards.

pdf abs
Neural Transition-based Syntactic Linearization
Linfeng Song | Yue Zhang | Daniel Gildea
Proceedings of the 11th International Conference on Natural Language Generation

The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art performance. Recent work shows that a multilayer LSTM language model outperforms competitive statistical syntactic linearization systems without using syntax. In this paper, we study neural syntactic linearization, building a transition-based syntactic linearizer leveraging a feed forward neural network, observing significantly better results compared to LSTM language models on this task.

pdf abs
N-ary Relation Extraction using Graph-State LSTM
Linfeng Song | Yue Zhang | Zhiguo Wang | Daniel Gildea
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Cross-sentence n-ary relation extraction detects relations among n entities across multiple sentences. Typical methods formulate an input as a document graph, integrating various intra-sentential and inter-sentential dependencies. The current state-of-the-art method splits the input graph into two DAGs, adopting a DAG-structured LSTM for each. Though being able to model rich linguistic knowledge by leveraging graph edges, important information can be lost in the splitting procedure. We propose a graph-state LSTM model, which uses a parallel state to model each word, recurrently enriching state values via message passing. Compared with DAG LSTMs, our graph LSTM keeps the original graph structure, and speeds up computation by allowing more parallelization. On a standard benchmark, our model shows the best result in the literature.

pdf abs
Leveraging Context Information for Natural Question Generation
Linfeng Song | Zhiguo Wang | Wael Hamza | Yue Zhang | Daniel Gildea
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

The task of natural question generation is to generate a corresponding question given the input passage (fact) and answer. It is useful for enlarging the training set of QA systems. Previous work has adopted sequence-to-sequence models that take a passage with an additional bit to indicate answer position as input. However, they do not explicitly model the information between answer and other context within the passage. We propose a model that matches the answer with the passage before generating the question. Experiments show that our model outperforms the existing state of the art using rich features.

pdf abs
A Notion of Semantic Coherence for Underspecified Semantic Representation
Mehdi Manshadi | Daniel Gildea | James F. Allen
Computational Linguistics, Volume 44, Issue 1 - April 2018

The general problem of finding satisfying solutions to constraint-based underspecified representations of quantifier scope is NP-complete. Existing frameworks, including Dominance Graphs, Minimal Recursion Semantics, and Hole Semantics, have struggled to balance expressivity and tractability in order to cover real natural language sentences with efficient algorithms. We address this trade-off with a general principle of coherence, which requires that every variable introduced in the domain of discourse must contribute to the overall semantics of the sentence. We show that every underspecified representation meeting this criterion can be efficiently processed, and that our set of representations subsumes all previously identified tractable sets.

pdf abs
Cache Transition Systems for Graph Parsing
Daniel Gildea | Giorgio Satta | Xiaochang Peng
Computational Linguistics, Volume 44, Issue 1 - April 2018

Motivated by the task of semantic parsing, we describe a transition system that generalizes standard transition-based dependency parsing techniques to generate a graph rather than a tree. Our system includes a cache with fixed size m, and we characterize the relationship between the parameter m and the class of graphs that can be produced through the graph-theoretic concept of tree decomposition. We find empirically that small cache sizes cover a high percentage of sentences in existing semantic corpora.

pdf abs
Weighted DAG Automata for Semantic Graphs
David Chiang | Frank Drewes | Daniel Gildea | Adam Lopez | Giorgio Satta
Computational Linguistics, Volume 44, Issue 1 - April 2018

Graphs have a variety of uses in natural language processing, particularly as representations of linguistic meaning. A deficit in this area of research is a formal framework for creating, combining, and using models involving graphs that parallels the frameworks of finite automata for strings and finite tree automata for trees. A possible starting point for such a framework is the formalism of directed acyclic graph (DAG) automata, defined by Kamimura and Slutzki and extended by Quernheim and Knight. In this article, we study the latter in depth, demonstrating several new results, including a practical recognition algorithm that can be used for inference and learning with models defined on DAG automata. We also propose an extension to graphs with unbounded node degree and show that our results carry over to the extended formalism.

pdf abs
Feature-Based Decipherment for Machine Translation
Iftekhar Naim | Parker Riley | Daniel Gildea
Computational Linguistics, Volume 44, Issue 3 - September 2018

Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.

pdf abs
A Graph-to-Sequence Model for AMR-to-Text Generation
Linfeng Song | Yue Zhang | Zhiguo Wang | Daniel Gildea
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The problem of AMR-to-text generation is to recover a text representing the same meaning as an input AMR graph. The current state-of-the-art method uses a sequence-to-sequence model, leveraging LSTM for encoding a linearized AMR structure. Although being able to model non-local semantic information, a sequence LSTM can lose information from the AMR graph structure, and thus facing challenges with large-graphs, which result in long sequences. We introduce a neural graph-to-sequence model, using a novel LSTM structure for directly encoding graph-level semantics. On a standard benchmark, our model shows superior results to existing methods in the literature.

pdf abs
Sequence-to-sequence Models for Cache Transition Systems
Xiaochang Peng | Linfeng Song | Daniel Gildea | Giorgio Satta
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present a sequence-to-sequence based approach for mapping natural language sentences to AMR semantic graphs. We transform the sequence to graph mapping problem to a word sequence to transition action sequence problem using a special transition system called a cache transition system. To address the sparsity issue of neural AMR parsing, we feed feature embeddings from the transition state to provide relevant local information for each decoder state. We present a monotonic hard attention model for the transition framework to handle the strictly left-to-right alignment between each transition state and the current buffer input focus. We evaluate our neural transition model on the AMR parsing task, and our parser outperforms other sequence-to-sequence approaches and achieves competitive results in comparison with the best-performing models.

pdf abs
Orthographic Features for Bilingual Lexicon Induction
Parker Riley | Daniel Gildea
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent embedding-based methods in bilingual lexicon induction show good results, but do not take advantage of orthographic features, such as edit distance, which can be helpful for pairs of related languages. This work extends embedding-based methods to incorporate these features, resulting in significant accuracy gains for related languages.

2017

pdf abs
Addressing the Data Sparsity Issue in Neural AMR Parsing
Xiaochang Peng | Chuan Wang | Daniel Gildea | Nianwen Xue
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Neural attention models have achieved great success in different NLP tasks. However, they have not fulfilled their promise on the AMR parsing task due to the data sparsity issue. In this paper, we describe a sequence-to-sequence model for AMR parsing and present different ways to tackle the data sparsity problem. We show that our methods achieve significant improvement over a baseline neural attention model and our results are also competitive against state-of-the-art systems that do not use extra linguistic resources.

pdf bib abs
AMR-to-text Generation with Synchronous Node Replacement Grammar
Linfeng Song | Xiaochang Peng | Yue Zhang | Zhiguo Wang | Daniel Gildea
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper addresses the task of AMR-to-text generation by leveraging synchronous node replacement grammar. During training, graph-to-string rules are learned using a heuristic extraction algorithm. At test time, a graph transducer is applied to collapse input AMRs and generate output sentences. Evaluated on a standard benchmark, our method gives the state-of-the-art result.

pdf
University of Rochester WMT 2017 NMT System Submission
Chester Holtz | Chuyang Ke | Daniel Gildea
Proceedings of the Second Conference on Machine Translation

We discuss learning latent annotations for synchronous context-free grammars (SCFG) for the purpose of improving machine translation. We show that learning annotations for nonterminals results in not only more accurate translation, but also faster SCFG decoding.

pdf
Grammar Factorization by Tree Decomposition
Daniel Gildea
Computational Linguistics, Volume 37, Issue 1 - March 2011

pdf
Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
Pierluigi Crescenzi | Daniel Gildea | Andrea Marino | Gianluca Rossi | Giorgio Satta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Terminal-Aware Synchronous Binarization
Licheng Fang | Tagyoung Chung | Daniel Gildea
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Issues Concerning Decoding with Synchronous Context-free Grammar
Tagyoung Chung | Licheng Fang | Daniel Gildea
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf
A Fast Fertility Hidden Markov Model for Word Alignment Using MCMC
Shaojun Zhao | Daniel Gildea
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf
Effects of Empty Categories on Machine Translation
Tagyoung Chung | Daniel Gildea
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf
Factors Affecting the Accuracy of Korean Parsing
Tagyoung Chung | Matt Post | Daniel Gildea
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
Semantic Role Features for Machine Translation
Ding Liu | Daniel Gildea
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf
Optimal Parsing Strategies for Linear Context-Free Rewriting Systems
Daniel Gildea
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf
Binarization of Synchronous Context-Free Grammars
Liang Huang | Hao Zhang | Daniel Gildea | Kevin Knight
Computational Linguistics, Volume 35, Number 4, December 2009

pdf
Bayesian Learning of a Tree Substitution Grammar
Matt Post | Daniel Gildea
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Weight Pushing and Binarization for Fixed-Grammar Parsing
Matt Post | Daniel Gildea
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf
Unsupervised Tokenization for Machine Translation
Tagyoung Chung | Daniel Gildea
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Bayesian Learning of Phrasal Tree-to-String Templates
Ding Liu | Daniel Gildea
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
Improved Tree-to-String Transducer for Machine Translation
Ding Liu | Daniel Gildea
Proceedings of the Third Workshop on Statistical Machine Translation

pdf
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
Hao Zhang | Chris Quirk | Robert C. Moore | Daniel Gildea
Proceedings of ACL-08: HLT

pdf
Efficient Multi-Pass Decoding for Synchronous Context Free Grammars
Hao Zhang | Daniel Gildea
Proceedings of ACL-08: HLT

pdf abs
Parsers as language models for statistical machine translation
Matt Post | Daniel Gildea
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Most work in syntax-based machine translation has been in translation modeling, but there are many reasons why we may instead want to focus on the language model. We experiment with parsers as language models for machine translation in a simple translation model. This approach demands much more of the language models, allowing us to isolate their strengths and weaknesses. We find that unmodified parsers do not improve BLEU scores over ngram language models, and provide an analysis of their strengths and weaknesses.

pdf
Extracting Synchronous Grammar Rules From Word-Level Alignments in Linear Time
Hao Zhang | Daniel Gildea | David Chiang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf
Factorization of Synchronous Context-Free Grammars in Linear Time
Hao Zhang | Daniel Gildea
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

pdf
Source-Language Features and Maximum Correlation Training for Machine Translation Evaluation
Ding Liu | Daniel Gildea
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
Worst-Case Synchronous Grammar Rules
Daniel Gildea | Daniel Štefankovič
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
Optimizing Grammars for Minimum Dependency Length
Daniel Gildea | David Temperley
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf
Efficient Search for Inversion Transduction Grammar
Hao Zhang | Daniel Gildea
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf
Factoring Synchronous Grammars by Sorting
Daniel Gildea | Giorgio Satta | Hao Zhang
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
Stochastic Iterative Alignment for Machine Translation Evaluation
Ding Liu | Daniel Gildea
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
Inducing Word Alignments with Bilexical Synchronous Trees
Hao Zhang | Daniel Gildea
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
Synchronous Binarization for Machine Translation
Hao Zhang | Liang Huang | Daniel Gildea | Kevin Knight
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf
The Proposition Bank: An Annotated Corpus of Semantic Roles
Martha Palmer | Daniel Gildea | Paul Kingsbury
Computational Linguistics, Volume 31, Number 1, March 2005

pdf
Stochastic Lexicalized Inversion Transduction Grammar for Alignment
Hao Zhang | Daniel Gildea
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)
Ido Dagan | Daniel Gildea
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

pdf
Syntactic Features for Evaluation of Machine Translation
Ding Liu | Daniel Gildea
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

pdf
Machine Translation as Lexicalized Parsing with Hooks
Liang Huang | Hao Zhang | Daniel Gildea
Proceedings of the Ninth International Workshop on Parsing Technology

pdf
Online Statistics for a Unification-Based Dialogue Parser
Micha Elsner | Mary Swift | James Allen | Daniel Gildea
Proceedings of the Ninth International Workshop on Parsing Technology

2004

pdf
Skeletons in the parser: Using a shallow parser to improve deep parsing
Mary Swift | James Allen | Daniel Gildea
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Syntax-Based Alignment: Supervised or Unsupervised?
Hao Zhang | Daniel Gildea
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Dependencies vs. Constituents for Tree-Based Alignment
Daniel Gildea
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf
Identifying Semantic Roles Using Combinatory Categorial Grammar
Daniel Gildea | Julia Hockenmaier
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

pdf
Loosely Tree-Based Alignment for Machine Translation
Daniel Gildea
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf abs
An algorithm for word-level alignment of parallel dependency trees
Yuan Ding | Daniel Gildea | Martha Palmer
Proceedings of Machine Translation Summit IX: Papers

Structural divergence presents a challenge to the use of syntax in statistical machine translation. We address this problem with a new algorithm for alignment of loosely matched non-isomorphic dependency trees. The algorithm selectively relaxes the constraints of the two tree structures while keeping computational complexity polynomial in the length of the sentences. Experimentation with a large Chinese-English corpus shows an improvement in alignment results over the unstructured models of (Brown et al., 1993).