A large part of wide coverage Tree Adjoining Grammars (TAG) is formed by trees that satisfy the restrictions imposed by Tree Insertion Grammars (TIG). This characteristic can be used to reduce the practical complexity of TAG parsing, applying the standard adjunction operation only in those cases in which the simpler cubic-time TIG adjunction cannot be applied. In this paper, we describe a parsing algorithm managing simultaneous adjunctions in TAG and TIG.
Le traitement automatique du langage naturel est un axe de recherche qui connaît chaque jour de nouvelles théories et approches. Les systèmes d’analyse automatique qui sont fondés sur une approche séquentielle présentent plusieurs inconvénients. Afin de pallier ces limites, nous nous sommes intéressés à la réalisation d’un système d’analyse syntaxique de textes arabes basé sur l’approche multi-agent : MASPAR « Multi-Agent System for Parsing ARabic ».
The paper describes an incremental parsing algorithm for natural languages that uses normalized interfaces of modules of proof-nets. This algorithm produces at each step the different possible partial syntactical analyses of the first words of a sentence. Thus, it can analyze texts on the fly leaving partially analyzed sentences.
This paper presents a technique for the representation and the implementation of interaction relations between different domains of linguistic analysis. This solution relies on the localization of the linguistic objects in the context. The relations are then implemented by means of interaction constraints, each domain information being expressed independently.
In this paper, we present a method which may speed up Earley parsers in practice. A first pass called a guiding parser builds an intermediate structure called a guide which is used by a second pass, an Earley parser, called a guided parser whose Predictor phase is slightly modified in such a way that it selects an initial item only if this item is in the guide. This approach is validated by practical experiments preformed on a large test set with an English context-free grammar.
We present a novel approach to supertagging w.r.t. some lexicalized grammar G. It differs from previous approaches in several ways:- These supertaggers rely only on structural information: they do not need any training phase;- These supertaggers do not compute the “best“ supertag for each word, but rather a set of supertags. These sets of supertags do not exclude any supertag that will eventually be used in a valid complete derivation (i.e., we have a recall score of 100%);- These supertaggers are in fact true parsers which accept supersets of L(G) that can be more efficiently parsed than the sentences of L(G).
Integration of two stochastic context-free grammars can be useful in two pass approaches used, for example, in speech recognition and understanding. Based on an algorithm proposed by [Nederhof and Satta, 2002] for the non-probabilistic case, left-to-right strategies for the search for the best solution based on CKY and Earley parsers are discussed. The restriction that one of the two grammars must be non recursive does not represent a problem in the considered applications.
Visual language editors should provide a user-friendly environment where users are supported in an effective way in the construction of visual sentences. In this paper, we propose an approach for the construction of syntax-directed visual language editors by integrating incremental parsers into freehand editors. The approach combines the LR-based techniques for parsing visual languages with the more general incremental Generalized LR parsing techniques developed for string languages.
Within a grammar formalism that treats syntax analysis as a global optimization problem, methods are investigated to improve parsing performance by recombining the solutions of smaller and easier subproblems. The robust nature of the formalism allows the application of this technique with little change to the original grammar.
In this paper, we present a definition of unification of weighted feature structures designed to deal with constraint relaxation. The application of phrase structure rules in a unification-based Natural Language Processing system is adapted such that inconsistent values do not lead to failure, but are penalised. These penalties are based on the signature and the shape of the feature structures, and thus realise an elegant and general approach to relaxation.
We propose two statistical left-corner parsers and investigate their accuracy at varying speeds. The parser based on a generative probability model achieves state-of-the-art accuracy when sufficient time is available, but when high speed is required the parser based on a discriminative probability model performs better. Neural network probability estimation is used to handle conditioning on both the unbounded parse histories and the unbounded lookahead strings.
The paper introduces PACE — a parser comparison and evaluation system for the syntactic processing of natural languages. The analysis is based on context free grammar with contextual extensions (constraints). The system is able to manage very large and extremely ambiguous CF grammars. It is independent of the parsing algorithm used. The tool can solve the contextual constraints on the resulting CF structure, select the best parsing trees according to their probabilities, or combine them. We discuss the advantages and disadvantages of our modular design as well as how efficiently it processes the standard evaluation grammars.
In this paper, we propose a new probabilistic GLR parsing method that can solve the problems of conventional methods. Our proposed Conditional Action Model uses Surface Phrasal Types (SPTs) encoding the functional word sequences of the sub-trees for describing structural characteristics of the partial parse. And, the proposed GLR model outperforms the previous methods by about 6~8%.
In this paper, we describe an approach to analysis for spoken language translation that combines phrase-level grammar-based parsing and automatic domain action classification. The job of the analyzer is to transform utterances into a shallow semantic task-oriented interlingua representation. The goal of our hybrid approach is to provide accurate real-time analyses and to improve robustness and portability to new domains and languages.
Parser does the part of speech (POS) identification in a sentence, which is required for Machine Translation (MT). An intelligent parser is a parser, which takes care of semantics along with the POS in a sentence. Use of such intelligent parser will reduce the complexity in semantics during MT apriori.
We show that a well-known algorithm to compute the intersection of a context-fre language and a regular language can be extended to apply to a probabilistic context-free grammar and a probabilistic finite automaton, provided the two probabilistic models are combined through multiplication. The result is a probabilistic context-free grammar that contains joint information about the original grammar and automaton.
This paper presents a deterministic parsing algorithm for projective dependency grammar. The running time of the algorithm is linear in the length of the input string, and the dependency graph produced is guaranteed to be projective and acyclic. The algorithm has been experimentally evaluated in parsing unrestricted Swedish text, achieving an accuracy above 85% with a very simple grammar.
In this paper an efficient algorithm for dependency parsing is described in which ambiguous dependency structure of a sentence is represented in the form of a graph. The idea of the algorithm is shortly outlined and some issues as to its time complexity are discussed.
We investigate an aspect of the relationship between parsing and corpus-based methods in NLP that has received relatively little attention: coverage augmentation in rule-based parsers. In the specific task of determining grammatical relations (such as subjects and objects) in transcribed spoken language, we show that a combination of rule-based and corpus-based approaches, where a rule-based system is used as the teacher (or an automatic data annotator) to a corpus-based system, outperforms either system in isolation.
We present a new formalism, partially ordered multiset context-free grammars (poms-CFG), along with an Earley-style parsing algorithm. The formalism, which can be thought of as a generalization of context-free grammars with partially ordered right-hand sides, is of interest in its own right, and also as infrastructure for obtaining tighter complexity bounds for more expressive context-free formalisms intended to express free or multiple word-order, such as ID/LP grammars. We reduce ID/LP grammars to poms-grammars, thereby getting finer-grained bounds on the parsing complexity of ID/LP grammars. We argue that in practice, the width of attested ID/LP grammars is small, yielding effectively polynomial time complexity for ID/LP grammar parsing.
Given a probabilistic parsing model and an evaluation metric for scoring the match between parse-trees, e.g., PARSEVAL [Black et al., 1991], this paper addresses the problem of how to select the on average best scoring parse-tree for an input sentence. Common wisdom dictates that it is optimal to select the parse with the highest probability, regardless of the evaluation metric. In contrast, the Maximizing Metrics (MM) method [Goodman, 1998, Stolcke et al., 1997] proposes that an algorithm that optimizes the evaluation metric itself constitutes the optimal choice. We study the MM method within parsing. We observe that the MM does not always hold for tree-bank models, and that optimizing weak metrics is not interesting for semantic processing. Subsequently, we state an alternative proposition: the optimal algorithm must maximize the metric that scores parse-trees according to linguistically relevant features. We present new algorithms that optimize metrics that take into account increasingly more linguistic features, and exhibit experiments in support of our claim.
In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90% accuracy of word-word dependency. Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that our parser uses no information from phrase structures.