07-1001_0	Following the notation in section 2.1 , the ij-th entry of the matrix W is defined as in ( ) Wj = (1 - e Wz ) where Cj is the total number of words in the j-th sentence pair	VVG DT NN IN NN CD , DT NN NN IN DT NN NP VBZ VVN IN IN ( ) NP SYM NN : SYM NP ) WRB NP VBZ DT JJ NN IN NNS IN DT JJ NN NN	Fundamental	Idea	Neutral
07-1001_1	For the simple bag-of-word bilingual LSA as described in Section 2.2.1 , after SVD on the sparse matrix using the toolkit SVDPACK ( ) , all source and target words are projected into a low-dimensional (R = 88) LSA-space	IN DT JJ NN JJ NP RB VVD IN NP CD , IN NN IN DT JJ NN VVG DT NN NP ( ) , DT NN CC NN NNS VBP VVN IN DT JJ NN SYM JJ NP	Fundamental	Basis	Neutral
07-1001_2	Discriminative word alignment models , such as Ittycheriah and Roukos ( ); Moore ( ); Blunsom and Cohn ( ) , have received great amount of study recently	JJ NN NN NNS , JJ IN NP CC NP ( NP NP ( NP NP CC NP ( ) , VHP VVN JJ NN IN NN RB	BackGround	GRelated	Neutral
07-1001_3	For instance , the 1 most relaxed IBM Model-1 , which assumes that any source word can be generated by any target word equally regardless of distance , can be improved by demanding a Markov process of alignments as in HMM-based models ( ) , or implementing a distribution of number of target words linked to a source word as in IBM fertility-based models ( )	IN NN , DT CD RBS VVN NP NP , WDT VVZ IN/that DT NN NN MD VB VVN IN DT NN NN RB RB IN NN , MD VB VVN IN VVG DT NP NN IN NNS IN IN JJ NNS ( ) , CC VVG DT NN IN NN IN NN NNS VVN TO DT NN NN IN IN NP JJ NNS ( )	BackGround	GRelated	Neutral
07-1001_3	It can be applied to complicated models such IBM Model-4 ( )	PP MD VB VVN TO JJ NNS JJ NP NP ( )	BackGround	SRelated	Neutral
07-1001_4	The language model is a statistical trigram model estimated with Modified Kneser-Ney smoothing ( ) using all English sentences in the parallel training data	DT NN NN VBZ DT JJ NN NN VVN IN NP NP VVG ( ) VVG DT JJ NNS IN DT JJ NN NNS	Fundamental	Basis	Neutral
07-1001_5	LSA has been successfully applied to information retrieval ( ) , statistical langauge modeling ( ) and etc	NP VHZ VBN RB VVN TO NN NN ( ) , JJ NN NN ( ) CC FW	BackGround	GRelated	Positive
07-1001_5	Alternative constructions of the matrix are possible using raw counts or TF-IDF ( )	JJ NNS IN DT NN VBP RB VVG JJ NNS CC NP ( )	BackGround	SRelated	Neutral
07-1001_6	It has been shown that human knowledge , in the form of a small amount of manually annotated parallel data to be used to seed or guide model training , can significantly improve word alignment F-measure and translation performance ( )	PP VHZ VBN VVN IN/that JJ NN , IN DT NN IN DT JJ NN IN RB VVN JJ NNS TO VB VVN TO NN CC NN NN NN , MD RB VV NN NN NN CC NN NN ( )	BackGround	GRelated	Positive
07-1001_8	Our decoder is a phrase-based multi-stack imple-mentation of the log-linear model similar to Pharaoh ( )	PP$ NN VBZ DT JJ NN NN IN DT JJ NN JJ TO NN ( )	Fundamental	Idea	Neutral
07-1001_9	Since Arabic is a morphologically rich language where affixes are attached to stem words to indicate gender , tense , case and etc , in order to reduce vocabulary size and address out-of-vocabulary words , we split Arabic words into affix and root according to a rule-based segmentation scheme ( ) with the help from the Buckwalter analyzer ( ) output	IN NP VBZ DT RB JJ NN WRB NNS VBP VVN TO VV NNS TO VV NN , JJ , NN CC FW , IN NN TO VV NN NN CC VV JJ NNS , PP VVD NP NNS IN NN CC NN VVG TO DT JJ NN NN ( ) IN DT NN IN DT NP NN ( ) NN	Fundamental	Basis	Neutral
07-1001_10	Basic models in two translation directions are trained simultaneously where statistics of two directions are shared to learn symmetric translation lexicon and word alignments with high precision motivated by ( ) and ( )	JJ NNS IN CD NN NNS VBP VVN RB WRB NNS IN CD NNS VBP VVN TO VV JJ NN NN CC NN NNS IN JJ NN VVN IN ( ) CC ( )	Fundamental	Idea	Positive
07-1001_11	In ( ) , bilingual semantic maps are constructed to guide word alignment	IN ( ) , JJ JJ NNS VBP VVN TO VV NN NN	BackGround	SRelated	Neutral
07-1001_12	As formulated in the competitive linking algorithm ( ) , the problem of word alignment can be regarded as a process of word linkage disambiguation , that is , choosing correct associations among all competing hypothesis	IN VVN IN DT JJ VVG NN ( ) , DT NN IN NN NN MD VB VVN IN DT NN IN NN NN NN , WDT VBZ , VVG JJ NNS IN DT VVG NN	BackGround	GRelated	Neutral
07-1001_12	The example demos that due to reasonable constraints placed in word alignment training , the link to "_tK" is corrected and consequently we have accurate word translation for the Arabic singleton 7 Heuristics based on co-occurrence analysis , such as point-wise mutual information or Dice coefficients  , have been shown to be indicative for word alignments ( )	DT NN NN IN/that JJ TO JJ NNS VVN IN NN NN NN , DT NN TO NN VBZ VVN CC RB PP VHP JJ NN NN IN DT NP NN CD NNS VVN IN NN NN , JJ IN JJ JJ NN CC NP NNS , VHP VBN VVN TO VB JJ IN NN NNS ( )	BackGround	GRelated	Neutral
07-1001_14	These feature weights are tuned on the dev set to achieve optimal translation performance using downhill simplex method ( )	DT NN NNS VBP VVN IN DT NN VVD TO VV JJ NN NN VVG RB JJ NN ( )	Fundamental	Basis	Neutral
07-1001_15	By combining word alignments in two directions using heuristics ( ) , a single set of static word alignments is then formed	IN VVG NN NNS IN CD NNS VVG NNS ( ) , DT JJ NN IN JJ NN NNS VBZ RB VVN	Fundamental	Basis	Neutral
07-1001_15	We simply modify the GIZA++ toolkit ( ) by always weighting lexicon probabilities with soft constraints during iterative model training , and obtain 0.7% TER reduction on both sets and 0.4% BLEU improvement on the test set	PP RB VV DT NP NN ( ) IN RB NN NN NNS IN JJ NNS IN JJ NN NN , CC VV CD NN NN IN DT NNS CC CD NP NN IN DT NN NN	Fundamental	Basis	Neutral
07-1001_16	We measure translation performance by the BLEU score ( ) and Translation Error Rate (TER) ( ) with one reference for each hypothesis	PP VV NN NN IN DT NP NN ( ) CC NN NP NP NN ( ) IN CD NN IN DT NN	Fundamental	Basis	Neutral
07-1001_18	Toutanova et al. ( ) augmented bilingual sentence pairs with part-of-speech tags as linguistic constraints for HMM-based word alignments	NP NP NP ( ) VVN JJ NN NNS IN NN NNS IN JJ NNS IN JJ NN NNS	BackGround	GRelated	Neutral
07-1001_19	While word alignments can help identifying semantic relations ( ) , we proceed in the reverse direction	IN NN NNS MD VV VVG JJ NNS ( ) , PP VVP IN DT JJ NN	BackGround	GRelated	Negative
07-1001_20	We shall take HMM-based word alignment model ( ) as an example and follow the notation of ( )	PP MD VV JJ NN NN NN ( ) IN DT NN CC VV DT NN IN ( )	Fundamental	Basis	Neutral
07-1001_20	Our baseline word alignment model is the word-to-word Hidden Markov Model ( )	PP$ NN NN NN NN VBZ DT NN NP NP NP ( )	Fundamental	Basis	Neutral
07-1002_0	Many state-of-the-art SMT systems do not use trees and base the ordering decisions on surface phrases ( )	JJ JJ NP NNS VVP RB VV NNS CC VV DT VVG NNS IN NN NNS ( )	BackGround	GRelated	Neutral
07-1002_0	An important advantage of our model is that it is global , and does not decompose the task of ordering a target sentence into a series of local decisions , as in the recently proposed order models for Machine Transition ( )	DT JJ NN IN PP$ NN VBZ IN/that PP VBZ JJ , CC VVZ RB VV DT NN IN VVG DT NN NN IN DT NN IN JJ NNS , RB IN DT RB VVN NN NNS IN NP NP ( )	BackGround	GRelated	Negative
07-1002_1	Alternatively , order is modelled in terms of movement of automatically induced hierarchical structure of sentences ( )	RB , NN VBZ VVN IN NNS IN NN IN RB VVN JJ NN IN NNS ( )	BackGround	GRelated	Negative
07-1002_2	These N-best lists are generated using approximate search and simpler models , as in the re-ranking approach of ( )	DT NP NNS VBP VVN VVG JJ NN CC JJR NNS , RB IN DT JJ NN IN ( )	Fundamental	Basis	Neutral
07-1002_3	tings , even for a bi-gram language model ( )	NNS , RB IN DT NN NN NN ( )	NULL	NULL	NULL
07-1002_4	9 The advantages of modeling how a target language syntax tree moves with respect to a source language syntax tree are that (i) we can capture the fact that constituents move as a whole and generally respect the phrasal cohesion constraints ( ) , and (ii) we can model broad syntactic reordering phenomena , such as subject-verb-object constructions translating into subject-object-verb ones , as is generally the case for English and Japanese	CD NP NNS IN VVG WRB DT NN NN NN NN NNS IN NN TO DT NN NN NN NN VBP DT NN PP MD VV DT NN IN/that NNS VVP IN DT JJ CC RB VV DT JJ NN NNS ( ) , CC NN PP MD VV JJ JJ VVG NNS , JJ IN JJ NNS VVG IN NN NNS , RB VBZ RB DT NN IN NP CC NP	BackGround	SRelated	Positive
07-1002_5	Previous work has shown that it is useful to model target language order in terms of movement of syntactic constituents in constituency trees ( ) or dependency trees ( ) , which are obtained using a parser trained to determine linguistic constituency	JJ NN VHZ VVN IN/that PP VBZ JJ TO VV NN NN NN IN NNS IN NN IN JJ NNS IN NN NNS ( ) CC NN NNS ( ) , WDT VBP VVN VVG DT NN VVN TO VV JJ NN	BackGround	GRelated	Positive
07-1002_6	Our results show that combining features derived from the source and target dependency trees , distortion surface order-based features (like the distortion used in Pharaoh ( )) and language model-like features results in a model which significantly outperforms models using only some of the information sources	PP$ NNS VVP IN/that VVG NNS VVN IN DT NN CC NN NN NNS , NN NN JJ NNS JJ DT NN VVN IN NN ( NN CC NN JJ NNS NNS IN DT NN WDT RB VVZ NNS VVG RB DT IN DT NN NNS	Fundamental	Basis	Positive
07-1002_6	Pharaoh DISP: Displacement as used in Pharaoh ( )	NN NP NP RB VVD IN NN ( )	Fundamental	Basis	Neutral
07-1002_8	These models are combined as feature functions in a (log)linear model for predicting a target sentence given a source sentence , in the framework proposed by ( )	DT NNS VBP VVN IN NN NNS IN DT JJ NN IN VVG DT NN NN VVN DT NN NN , IN DT NN VVN IN ( )	Fundamental	Basis	Neutral
07-1002_9	The target dependency trees are obtained through projection of the source dependency trees , using the word alignment (we use GIZA++ ( )) , ensuring better parallelism of the source and target structures	DT NN NN NNS VBP VVN IN NN IN DT NN NN NNS , VVG DT NN NN NNS VVP NP ( NP , VVG JJR NN IN DT NN CC NN NNS	Fundamental	Basis	Neutral
07-1002_9	The sentences were annotated with alignment (using GIZA++ ( )) and syntactic dependency structures of the source and target , obtained as described in Section 2	DT NNS VBD VVN IN NN NN NP ( NN CC JJ NN NNS IN DT NN CC NN , VVN IN VVN IN NP CD	Fundamental	Basis	Neutral
07-1002_10	Our model is discriminatively trained to select the best order (according to the BLEU measure) ( ) of an unordered target dependency tree from the space of possible orders	PP$ NN VBZ RB VVN TO VV DT JJS NN NN TO DT NP NN ( ) IN DT JJ NN NN NN IN DT NN IN JJ NNS	Fundamental	Basis	Neutral
07-1002_11	Our algorithm for obtaining target dependency trees by projection of the source trees via the word alignment is the one used in the MT system of ( )	PP$ NN IN VVG NN NN NNS IN NN IN DT NN NNS IN DT NN NN VBZ DT CD VVN IN DT NP NN IN ( )	Fundamental	Basis	Neutral
07-1002_11	It follows the order model defined in ( )	PP VVZ DT NN NN VVN IN ( )	Fundamental	Idea	Neutral
07-1002_11	Our baseline SMT system is the system of Quirk et al. ( )	PP$ JJ NP NN VBZ DT NN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1002_11	The projection algorithm of ( ) defines heuristics for each of these problems	DT NN NN IN ( ) VVZ NNS IN DT IN DT NNS	BackGround	SRelated	Neutral
07-1002_12	1 Previous studies have shown that if both the source and target dependency trees represent linguistic constituency , the alignment between subtrees in the two languages is very complex ( )	CD JJ NNS VHP VVN IN/that IN CC DT NN CC NN NN NNS VVP JJ NN , DT NN IN NNS IN DT CD NNS VBZ RB JJ ( )	BackGround	GRelated	Neutral
07-1003_0	A host of discriminative methods have been introduced ( )	DT NN IN JJ NNS VHP VBN VVN ( )	BackGround	GRelated	Neutral
07-1003_0	2 We also investigated extraction-specific metrics: the frequency of interior nodes - a measure of how often the alignments violate the constituent structure of English parses - and a variant of the CPER metric of Ayan and Dorr ( )	LS PP RB VVD JJ NN DT NN IN JJ NNS : DT NN IN WRB RB DT NNS VVP DT JJ NN IN NP VVZ : CC DT NN IN DT NP JJ IN NP CC NP ( )	Fundamental	Basis	Neutral
07-1003_0	* From Ayan and Dorr ( ) , grow-diag-final heuristic	SYM IN NP CC NP ( ) , JJ JJ	NULL	NULL	NULL
07-1003_0	5 Similarly , we compared our Chinese results to the GIZA++ results in Ayan and Dorr ( )	CD RB , PP VVD PP$ JJ NNS TO DT NP NNS IN NP CC NP ( )	Compare	Compare	Neutral
07-1003_0	Additionally , we evaluated our model with the transducer analog to the consistent phrase error rate (CPER) metric of Ayan and Dorr ( )	RB , PP VVD PP$ NN IN DT NN NN TO DT JJ NN NN NN NN JJ IN NP CC NP ( )	Fundamental	Basis	Neutral
07-1003_1	Like the classic IBM models ( ) , our model will introduce a latent alignment vector a = {a 1  ,... ,a J } that specifies the position of an aligned target word for each source word	IN DT JJ NP NNS ( ) , PP$ NN MD VV DT JJ NN NN DT SYM NP CD NP NP NP ) WDT VVZ DT NN IN DT VVN NN NN IN DT NN NN	Fundamental	Idea	Neutral
07-1003_2	However , few of these methods have explicitly addressed the tension between word alignments and the syntactic processes that employ them ( )	RB , JJ IN DT NNS VHP RB VVN DT NN IN NN NNS CC DT JJ NNS WDT VVP PP ( )	BackGround	GRelated	Negative
07-1003_3	Syntactic methods are an increasingly promising approach to statistical machine translation , being both algorithmically appealing ( ) and empirically successful ( )	JJ NNS VBP DT RB JJ NN TO JJ NN NN , VBG DT RB JJ ( ) CC RB JJ ( )	BackGround	GRelated	Positive
07-1003_4	Daume III and Marcu ( ) employs a syntax-aware distortion model for aligning summaries to documents , but condition upon the roots of the constituents that are jumped over during a transition , instead ofthose that are visited during a walk through the tree	NP NP CC NP ( ) VVZ DT JJ NN NN IN VVG NNS TO NNS , CC NN IN DT NNS IN DT NNS WDT VBP VVN RP IN DT NN , RB NN WDT VBP VVN IN DT NN IN DT NN	BackGround	GRelated	Neutral
07-1004_0	Our transductive learning algorithm , Algorithm 1 , is inspired by the Yarowsky algorithm ( )	PP$ JJ VVG NN , NP CD , VBZ VVN IN DT NP NN ( )	Fundamental	Idea	Neutral
07-1004_0	Under certain precise conditions , as described in ( ) , we can analyze Algorithm 1 as minimizing the entropy of the distribution over translations of U	IN JJ JJ NNS , RB VVN IN ( ) , PP MD VV NP CD IN VVG DT NN IN DT NN IN NNS IN NP	Fundamental	Idea	Neutral
07-1004_1	We used the following scoring functions in our experiments: Length-normalized Score: Each translated sentence pair (t , s) is scored according to the model probability p (t | s) normalized by the length |t| of the target sentence: Score(t , s)  = p (t | s) 1*1 (3) Confidence Estimation: The confidence estimation which we implemented follows the approaches suggested in ( ): The confidence score of a target sentence t is calculated as a log-linear combination of phrase posterior probabilities , Levenshtein-based word posterior probabilities , and a target language model score	PP VVD DT NN VVG NNS IN PP$ JJ NP NP DT VVN NN NN NN , NN VBZ VVN VVG TO DT NN NN NN NN SYM NN VVN IN DT NN NN IN DT NN NN NP , NN SYM NN NN SYM JJ JJ NN NP NP DT NN NN WDT PP VVD VVZ DT NNS VVN IN ( NN DT NN NN IN DT NN NN NN VBZ VVN IN DT JJ NN IN NN NN NNS , JJ NN NN NNS , CC DT NN NN NN NN	Fundamental	Idea	Neutral
07-1004_2	One language pair creates data for another language pair and can be naturally used in a ( )-style co-training algorithm	CD NN NN VVZ NNS IN DT NN NN CC MD VB RB VVN IN DT ( JJ NN NN	BackGround	GRelated	Neutral
07-1004_3	These lists are rescored with the following models: (a) the different models used in the decoder which are described above , (b) two different features based on IBM Model l ( ) , (c) posterior probabilities for words , phrases , n-grams , and sentence length ( ) , all calculated over the N-best list and using the sentence probabilities which the baseline system assigns to the translation hypotheses	DT NNS VBP VVN IN DT VVG NN NN DT JJ NNS VVN IN DT NN WDT VBP VVN IN , JJ CD JJ NNS VVN IN NP NP NN ( ) , JJ JJ NNS IN NNS , NNS , NNS , CC NN NN ( ) , RB VVN IN DT NP NN CC VVG DT NN NNS WDT DT NN NN VVZ TO DT NN NNS	Fundamental	Basis	Neutral
07-1004_4	In ( ) , a generative model for word alignment is trained using unsupervised learning on parallel text	IN ( ) , DT JJ NN IN NN NN VBZ VVN VVG JJ NN IN JJ NN	BackGround	GRelated	Neutral
07-1004_5	In ( ) co-training is applied to MT	IN ( ) NN VBZ VVN TO NP	BackGround	GRelated	Neutral
07-1004_6	Along similar lines , ( ) combine a generative model of word alignment with a log-linear discriminative model trained on a small set of hand aligned sentences	IN JJ NNS , ( ) VV DT JJ NN IN NN NN IN DT JJ JJ NN VVN IN DT JJ NN IN NN VVN NNS	BackGround	GRelated	Neutral
07-1004_8	BLEU score using the algorithm described in ( )	NP NN VVG DT NN VVN IN ( )	NULL	NULL	NULL
07-1004_10	The models (or features) which are employed by the decoder are: (a) one or several phrase table(s) , which model the translation direction p (s 1 1) , (b) one or several n-gram language model(s) trained with the SRILM toolkit ( ); in the experiments reported here , we used 4-gram models on the NIST data , and a trigram model on EuroParl , (c) a distortion model which assigns a penalty based on the number of source words which are skipped when generating a new target phrase , and (d) a word penalty	DT NNS NN NN WDT VBP VVN IN DT NN NN NN CD CC JJ NN NN , WDT NN DT NN NN NN NNS CD JJ , JJ CD CC JJ NN NN NN VVN IN DT NP NN ( NN IN DT NNS VVD RB , PP VVD JJ NNS IN DT JJ NNS , CC DT NN NN IN NN , JJ DT NN NN WDT VVZ DT NN VVN IN DT NN IN NN NNS WDT VBP VVN WRB VVG DT JJ NN NN , CC NN DT NN NN	Fundamental	Basis	Neutral
07-1004_11	for a detailed description see ( )	IN DT JJ NN VV ( )	BackGround	SRelated	Neutral
07-1004_11	For details , see ( )	IN NNS , VVP ( )	BackGround	SRelated	Neutral
07-1004_13	It overlaps with the original phrase tables , but also contains many new phrase pairs ( )	PP VVZ IN DT JJ NN NNS , CC RB VVZ JJ JJ NN NNS ( )	BackGround	SRelated	Neutral
07-1004_13	Self-training for SMT was proposed in ( )	NP IN NP VBD VVN IN ( )	BackGround	GRelated	Neutral
07-1005_0	Recently , Cabezas and Resnik ( ) experimented with incorporating WSD translations into Pharaoh , a state-of-the-art phrase-based MT system ( )	RB , NP CC NP ( ) VVN IN VVG JJ NNS IN NN , DT JJ JJ NP NN ( )	BackGround	GRelated	Positive
07-1005_0	The relatively small improvement reported by Cabezas and Resnik ( ) without a statistical significance test appears to be inconclusive	DT RB JJ NN VVN IN NP CC NP ( ) IN DT JJ NN NN VVZ TO VB JJ	BackGround	GRelated	Negative
07-1005_0	Note that comparing with the MT systems used in ( ) and ( ) , the Hiero system we are using represents a much stronger baseline MT system upon which the WSD system must improve	NN IN/that VVG IN DT NP NNS VVN IN ( ) CC ( ) , DT NP NN PP VBP VVG VVZ DT RB JJR JJ NP NN IN WDT DT JJ NN MD VV	Compare	Compare	Negative
07-1005_1	Carpuat and Wu ( ) integrated the translation predictions from a Chinese WSD system ( ) into a Chinese-English word-based statistical MT system using the ISI ReWrite decoder ( )	NP CC NP ( ) VVN DT NN NNS IN DT JJ NN NN ( ) IN DT NP JJ JJ NP NN VVG DT NP NP NN ( )	BackGround	GRelated	Neutral
07-1005_1	Note that the experiments in ( ) did not use a state-of-the-art MT system , while the experiments in ( ) were not done using a full-fledged MT system and the evaluation was not on how well each source sentence was translated as a whole	NN IN/that DT NNS IN ( ) VVD RB VV DT JJ NP NN , IN DT NNS IN ( ) VBD RB VVN VVG DT JJ NP NN CC DT NN VBD RB IN WRB RB DT NN NN VBD VVN IN DT NN	BackGround	GRelated	Negative
07-1005_2	We obtain accuracy that compares favorably to the best participating system in the task ( )	PP VVP NN WDT VVZ RB TO DT JJS VVG NN IN DT NN ( )	Fundamental	Basis	Neutral
07-1005_3	For our experiments , we use the SVM implementation of ( ) as it is able to work on multi-class problems to output the classification probability for each class	IN PP$ NNS , PP VVP DT NP NN IN ( ) IN PP VBZ JJ TO VV IN JJ NNS TO NN DT NN NN IN DT NN	Fundamental	Basis	Positive
07-1005_4	Capitalizing on the strength of the phrase-based approach , Chiang ( ) introduced a hierarchical phrase-based statistical MT system , Hiero , which achieves significantly better translation performance than Pharaoh ( ) , which is a state-of-the-art phrase-based statistical MT system	VVG IN DT NN IN DT JJ NN , NP ( ) VVD DT JJ JJ JJ NP NN , NP , WDT VVZ RB JJR NN NN IN NN ( ) , WDT VBZ DT JJ JJ JJ NP NN	BackGround	GRelated	Neutral
07-1005_4	In this paper , we successfully integrate a state-of-the-art WSD system into the state-of-the-art hierarchical phrase-based MT system , Hiero ( )	IN DT NN , PP RB VV DT JJ NN NN IN DT JJ JJ JJ NP NN , NP ( )	Fundamental	Basis	Positive
07-1005_4	Hiero ( ) is a hierarchical phrase-based model for statistical machine translation , based on weighted synchronous context-free grammar (CFG) ( )	NP ( ) VBZ DT JJ JJ NN IN JJ NN NN , VVN IN JJ JJ JJ NN NN ( )	BackGround	SRelated	Neutral
07-1005_4	Similar to ( ) , we trained the Hiero system on the FBIS corpus , used the NIST MT 2002 evaluation test set as our development set to tune the feature weights , and the NIST MT 2003 evaluation test set as our test data	JJ TO ( ) , PP VVN DT NP NN IN DT NP NN , VVD DT NP NP CD NN NN VVN IN PP$ NN VVD TO VV DT NN NNS , CC DT NP NP CD NN NN VVN IN PP$ NN NNS	Fundamental	Idea	Neutral
07-1005_4	Following ( ) , we used the version 11a NIST BLEU script with its default settings to calculate the BLEU scores ( ) based on case-insensitive n-gram matching , where n is up to 4	VVG ( ) , PP VVD DT NN NP NP NP NN IN PP$ NN NNS TO VV DT NP NNS ( ) VVN IN JJ NN NN , WRB NN VBZ RB TO CD	Fundamental	Idea	Neutral
07-1005_5	A n-gram language model adds a dependence on (n1) neighboring target-side words ( ) , making decoding much more difficult but still polynomial; in this paper , we add features that depend on the neighboring source-side words , which does not affect decoding complexity at all because the source string is fixed	DT NN NN NN VVZ DT NN IN JJ JJ NN NNS ( ) , VVG VVG RB RBR JJ CC RB JJ IN DT NN , PP VVP NNS WDT VVP IN DT JJ NN NNS , WDT VVZ RB VV VVG NN IN DT IN DT NN NN VBZ VVN	BackGround	SRelated	Neutral
07-1005_6	The improvement of 0.57 is statistically significant at p  < 0.05 using the sign-test as described by Collins et al. ( ) , with 374 (+1) , 318 (1) and 227 (0)	DT NN IN CD VBZ RB JJ IN NN SYM CD VVG DT NN IN VVN IN NP NP NP ( ) , IN CD NN , CD NN CC CD NN	Fundamental	Basis	Positive
07-1005_8	To perform translation , state-of-the-art MT systems use a statistical phrase-based approach ( ) by treating phrases as the basic units of translation	TO VV NN , JJ NP NNS VVP DT JJ JJ NN ( ) IN VVG NNS IN DT JJ NNS IN NN	BackGround	GRelated	Neutral
07-1005_8	The word alignments of both directions are then combined into a single set of alignments using the "diag-and" method of Koehn et al. ( )	DT NN NNS IN DT NNS VBP RB VVN IN DT JJ NN IN NNS VVG DT JJ NN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1005_12	Prior research has shown that using Support Vector Machines (SVM) as the learning algorithm for WSD achieves good results ( )	JJ NN VHZ VVN IN/that VVG NP NP NPS NN IN DT VVG NN IN NP VVZ JJ NNS ( )	BackGround	GRelated	Positive
07-1005_12	Our implemented WSD classifier uses the knowledge sources of local collocations , parts-of-speech (POS) , and surrounding words , following the successful approach of ( )	PP$ VVN NN NN VVZ DT NN NNS IN JJ NNS , NN NN , CC VVG NNS , VVG DT JJ NN IN ( )	Fundamental	Idea	Positive
07-1005_15	First , we performed word alignment on the FBIS parallel corpus using GIZA++ ( ) in both directions	RB , PP VVD NN NN IN DT NP JJ NN VVG NP ( ) IN DT NNS	Fundamental	Basis	Neutral
07-1005_16	Hiero uses a general log-linear model ( ) where the weight of a derivation  D for a particular source sentence and its translation is where  ^  i is a feature function and  X i is the weight for feature  ^  i	NP VVZ DT JJ JJ NN ( ) WRB DT NN IN DT NN NP IN DT JJ NN NN CC PP$ NN VBZ WRB SYM NP VBZ DT NN NN CC NP NP VBZ DT NN IN NN SYM NP	BackGround	SRelated	Neutral
07-1005_18	Using the MT 2002 test set , we ran the minimum-error rate training (MERT) ( ) with the decoder to tune the weights for each feature	VVG DT NP CD NN NN , PP VVD DT NN NN NN NN ( ) IN DT NN TO VV DT NNS IN DT NN	Fundamental	Basis	Neutral
07-1005_20	the English portion of the FBIS corpus and the Xinhua portion of the Gigaword corpus , we trained a tri-gram language model using the SRI Language Modelling Toolkit ( )	DT JJ NN IN DT NP NN CC DT NP NN IN DT NP NN , PP VVN DT NN NN NN VVG DT NP NP NP NP ( )	Fundamental	Basis	Neutral
07-1006_0	WSD approaches can be classified as (a) knowledge-based approaches , which make use of linguistic knowledge , manually coded or extracted from lexical resources ( ); (b) corpus-based approaches , which make use of shallow knowledge automatically acquired from corpus and statistical or machine learning algorithms to induce disambiguation models ( ); and (c) hybrid approaches , which mix characteristics from the two other approaches to automatically acquire disambiguation models from corpus supported by linguistic knowledge ( )	NN NNS MD VB VVN IN JJ JJ NNS , WDT VVP NN IN JJ NN , RB VVN CC VVN IN JJ NNS ( JJ JJ JJ NNS , WDT VVP NN IN JJ NN RB VVN IN NN CC JJ CC NN VVG NNS TO VV NN NNS ( JJ CC JJ JJ NNS , WDT NN NNS IN DT CD JJ NNS TO RB VV NN NNS IN NN VVN IN JJ NN ( )	BackGround	GRelated	Neutral
07-1006_1	Although it has been argued that WSD does not yield better translation quality than a machine translation system alone , it has been recently shown that a WSD module that is developed following specific multilingual requirements can significantly improve the performance of a machine translation system ( )	IN PP VHZ VBN VVN IN/that NP VVZ RB VV JJR NN NN IN DT NN NN NN RB , PP VHZ VBN RB VVN IN/that DT JJ NN WDT VBZ VVN VVG JJ JJ NNS MD RB VV DT NN IN DT NN NN NN ( )	BackGround	GRelated	Positive
07-1006_2	Finally , MC-WSD ( ) is a multi-class averaged perceptron classifier using syntactic and narrow context features , with one component trained on the data provided by Senseval and other trained on WordNet glosses	RB , NP ( ) VBZ DT NN VVD NN NN VVG JJ CC JJ NN NNS , IN CD NN VVN IN DT NNS VVN IN NP CC JJ VVN IN NP VVZ	BackGround	SRelated	Neutral
07-1006_3	It is an interesting approach to learning which has been considered promising for several applications in natural language processing and has been explored for a few of them , namely POS-tagging , grammar acquisition and semantic parsing ( )	PP VBZ DT JJ NN TO NN WDT VHZ VBN VVN VVG IN JJ NNS IN JJ NN NN CC VHZ VBN VVN IN DT JJ IN PP , RB NP , NN NN CC JJ VVG ( )	BackGround	GRelated	Neutral
07-1006_4	For example , Dang and Palmer ( ) also use a rich set of features with a traditional learning algorithm (maximum entropy)	IN NN , NP CC NP ( ) RB VV DT JJ NN IN NNS IN DT JJ VVG NN NN NN	BackGround	GRelated	Neutral
07-1006_5	Linguistic knowledge is available in electronic resources suitable for practical use , such as WordNet ( ) , dictionaries and parsers	JJ NN VBZ JJ IN JJ NNS JJ IN JJ NN , JJ IN NP ( ) , NNS CC NNS	BackGround	GRelated	Neutral
07-1006_6	There is not always a direct relation between the possible senses for a word in a (monolingual) lexicon and its translations to a particular language , so this represents a different task to WSD against a (monolingual) lexicon ( )	EX VBZ RB RB DT JJ NN IN DT JJ NNS IN DT NN IN DT JJ NN CC PP$ NNS TO DT JJ NN , RB DT VVZ DT JJ NN TO NN IN DT JJ NN ( )	BackGround	GRelated	Neutral
07-1006_7	CLaC1 ( ) uses a Naive Bayes algorithm with a dynamically adjusted context window around the target word	JJ ( ) VVZ DT JJ NP NN IN DT RB VVN NN NN IN DT NN NN	BackGround	SRelated	Neutral
07-1006_8	The sense with the highest count of overlapping words in its dictionary definition and in the sentence containing the target verb (excluding stop words)   ( ) ,   represented by has_overlapping(sentence , translation) : has_overlapping(snt 1 , voltar)	DT NN IN DT JJS NN IN JJ NNS IN PP$ NN NN CC IN DT NN VVG DT NN NN VVG NN NN ( ) , VVN IN NN , NN : NN CD , NN	BackGround	SRelated	Neutral
07-1006_9	Verbs and possible senses in our corpus Both corpora were lemmatized and part-of-speech (POS) tagged using Minipar ( ) and Mxpost ( ) , respectivelly	NNS CC JJ NNS IN PP$ NN CC NNS VBD JJ CC NN NN VVN VVG NP ( ) CC NP ( ) , NP	Fundamental	Basis	Neutral
07-1006_10	These approaches have shown good results; particularly those using supervised learning ( )	DT NNS VHP VVN JJ NN RB DT VVG JJ NN ( )	BackGround	GRelated	Positive
07-1006_10	WSD systems have generally been more successful in the disambiguation of nouns than other grammatical categories ( )	NN NNS VHP RB VBN RBR JJ IN DT NN IN NNS IN JJ JJ NNS ( )	BackGround	GRelated	Positive
07-1006_11	Syntalex-3 ( ) is based on an ensemble of bagged decision trees with narrow context part-of-speech features and bigrams	NP ( ) VBZ VVN IN DT NN IN VVN NN NNS IN JJ NN NN NNS CC NNS	BackGround	SRelated	Neutral
07-1006_13	This is achieved using Inductive Logic Programming (ILP) ( ) , which has not yet been applied to WSD	DT VBZ VVN VVG NP NP NP NN ( ) , WDT VHZ RB RB VBN VVN TO NP	Fundamental	Basis	Neutral
07-1006_13	Inductive Logic Programming ( ) employs techniques from Machine Learning and Logic Programming to build first-order theories from examples and background knowledge , which are also represented by first-order clauses	JJ NP NP ( ) VVZ NNS IN NP NP CC NP NP TO VV NN NNS IN NNS CC NN NN , WDT VBP RB VVN IN NN NNS	BackGround	SRelated	Neutral
07-1006_14	2.A more specific clause (the bottom clause) is built using inverse entailment ( ) , generally consisting of the representation of all the knowledge about that example	NP RBR JJ NN NN NN NN VBZ VVN VVG JJ NN ( ) , RB VVG IN DT NN IN PDT DT NN IN DT NN	Fundamental	Basis	Neutral
07-1006_20	This corpus was automatically annotated with the translation of the verb using a tagging system based on parallel corpus , statistical information and translation dictionaries ( ) , followed by a manual revision	DT NN VBD RB VVN IN DT NN IN DT NN VVG DT VVG NN VVN IN JJ NN , JJ NN CC NN NNS ( ) , VVN IN DT JJ NN	Fundamental	Basis	Neutral
07-1006_21	All the knowledge sources were made available to be used by the inference engine , since previous experiments showed that they are all relevant ( )	PDT DT NN NNS VBD VVN JJ TO VB VVN IN DT NN NN , IN JJ NNS VVD IN/that PP VBP RB JJ ( )	BackGround	GRelated	Neutral
07-1006_22	We use the Aleph ILP system ( ) , which provides a complete inference engine and can be customized in various ways	PP VVP DT NP NP NN ( ) , WDT VVZ DT JJ NN NN CC MD VB VVN IN JJ NNS	Fundamental	Basis	Neutral
07-1006_23	In the hybrid approaches that have been explored so far , deep knowledge , like selectional preferences , is either pre-processed into a vector representation to accommodate machine learning algorithms , or used in previous steps to filter out possible senses e.g. ( )	IN DT JJ NNS WDT VHP VBN VVN RB RB , JJ NN , IN JJ NNS , VBZ RB VVN IN DT NN NN TO VV NN VVG NNS , CC VVN IN JJ NNS TO NN IN JJ NNS FW ( )	BackGround	GRelated	Neutral
07-1007_0	Roark and Bacchiani ( ) showed that weighted count-merging is a special case of maximum a posteriori (MAP) estimation , and successfully used it for probabilistic context-free grammar domain adaptation ( ) and language model adaptation ( )	NP CC NP ( ) VVD IN/that JJ NN VBZ DT JJ NN IN NN DT NN NN NN , CC RB VVD PP IN JJ JJ NN NN NN ( ) CC NN NN NN ( )	BackGround	GRelated	Positive
07-1007_1	We have recently shown that this algorithm is effective in estimating the sense priors of a set of nouns ( )	PP VHP RB VVN IN/that DT NN VBZ JJ IN VVG DT NN NNS IN DT NN IN NNS ( )	BackGround	SRelated	Positive
07-1007_1	However , in ( ) , we showed that in a supervised setting where one has access to some annotated training data , the EM-based method in section 5 estimates the sense priors more effectively than the method described in ( )	RB , IN ( ) , PP VVD IN/that IN DT JJ NN WRB PP VHZ NN TO DT VVN NN NNS , DT JJ NN IN NN CD NNS DT NN NNS RBR RB IN DT NN VVN IN ( )	Compare	Compare	Neutral
07-1007_2	A similar work is the recent research by Chen et al. ( ) , where active learning was used successfully to reduce the annotation effort for WSD of 5 English verbs using coarse-grained evaluation	DT JJ NN VBZ DT JJ NN IN NP NP NP ( ) , WRB JJ NN VBD VVN RB TO VV DT NN NN IN NP IN CD JJ NNS VVG JJ NN	BackGround	SRelated	Neutral
07-1007_2	This is slightly higher than the 5.8 senses per verb in ( ) , where the experiments were conducted using coarse-grained evaluation	DT VBZ RB JJR IN DT CD NNS IN NN IN ( ) , WRB DT NNS VBD VVN VVG JJ NN	Compare	Compare	Neutral
07-1007_2	For WSD , Fujii et al. ( ) used selective sampling for a Japanese language WSD system , Chen et al. ( ) used active learning for 5 verbs using coarse-grained evaluation , and H	IN NP , NP NP NP ( ) VVN JJ NN IN DT JJ NN NN NN , NP NP NP ( ) VVN JJ NN IN CD NNS VVG JJ NN , CC NP	BackGround	GRelated	Neutral
07-1007_3	Dang ( ) employed active learning for another set of 5 verbs	NP ( ) VVN JJ NN IN DT NN IN CD NNS	BackGround	GRelated	Neutral
07-1007_4	To investigate this , Escudero et al. ( ) and Martinez and Agirre ( ) conducted experiments using the DSO corpus , which contains sentences from two different corpora , namely Brown Corpus (BC) and Wall Street Journal (WSJ)	TO VV DT , NP NP NP ( ) CC NP CC NP ( ) VVN NNS VVG DT NP NN , WDT VVZ NNS IN CD JJ NNS , RB NP NP NP CC NP NP NP NN	BackGround	GRelated	Neutral
07-1007_4	Escudero et al. ( ) pointed out that one of the reasons for the drop in accuracy is the difference in sense priors (i.e. , the proportions of the different senses of a word) between BC and WSJ	NP NP NP ( ) VVD RP IN/that CD IN DT NNS IN DT NN IN NN VBZ DT NN IN NN NNS JJ , DT NNS IN DT JJ NNS IN DT NN IN NP CC NP	BackGround	GRelated	Neutral
07-1007_4	Following the setup of ( ) , we similarly made use of the DSO corpus to perform our experiments on domain adaptation	VVG DT NN IN ( ) , PP RB VVD NN IN DT NP NN TO VV PP$ NNS IN NN NN	Fundamental	Idea	Neutral
07-1007_4	As mentioned in section 1 , research in ( ) noted an improvement in accuracy when they adjusted the BC and WSJ datasets such that the proportions of the different senses of each word were the same between BC and WSJ	IN VVN IN NN CD , NN IN ( ) VVD DT NN IN NN WRB PP VVD DT NP CC NP VVZ JJ IN/that DT NNS IN DT JJ NNS IN DT NN VBD DT JJ IN NP CC NP	BackGround	SRelated	Neutral
07-1007_4	Escudero et al. ( ) used the DSO corpus to highlight the importance of the issue of domain dependence of WSD systems , but did not propose methods such as active learning or count-merging to address the specific problem of how to perform domain adaptation for WSD	NP NP NP ( ) VVN DT NP NN TO VV DT NN IN DT NN IN NN NN IN JJ NNS , CC VVD RB VV NNS JJ IN JJ VVG CC VVG TO VV DT JJ NN IN WRB TO VV NN NN IN NP	BackGround	GRelated	Neutral
07-1007_6	This is similar to the approach taken in ( ) where they focus on determining the predominant sense of words in corpora drawn from finance versus sports domains	DT VBZ JJ TO DT NN VVN IN ( ) WRB PP VVP IN VVG DT JJ NN IN NNS IN NNS VVN IN NN CC NNS NNS	Fundamental	Idea	Neutral
07-1007_6	Research by McCarthy et al. ( ) and Koeling et al. ( ) pointed out that a change of predominant sense is often indicative of a change in domain	NN IN NP NP NP ( ) CC NP NP NP ( ) VVD RP IN/that DT NN IN JJ NN VBZ RB JJ IN DT NN IN NN	BackGround	GRelated	Neutral
07-1007_7	These knowledge sources were effectively used to build a state-of-the-art WSD program in one of our prior work ( )	DT NN NNS VBD RB VVN TO VV DT JJ NN NN IN CD IN PP$ JJ NN ( )	BackGround	SRelated	Positive
07-1007_8	To reduce the effort required to adapt a WSD system to a new domain , we employ an active learning strategy ( ) to select examples to annotate from the new domain of interest	TO VV DT NN VVN TO VV DT JJ NN TO DT JJ NN , PP VVP DT JJ NN NN ( ) TO VV NNS TO VV IN DT JJ NN IN NN	Fundamental	Basis	Neutral
07-1007_8	With active learning ( ) , we use uncertainty sampling as shown r   WSD system trained on D T b   word sense prediction for d using r  p   confidence of prediction b if p < p  min then Figure 1: Active learning in Figure 1	IN JJ NN ( ) , PP VVP NN VVG IN VVN NN SENT JJ NN VVN IN NP NN SYM SENT NN NN NN IN NN VVG NN NN SENT NN IN NN SYM IN NN SYM NN NN RB NP CD JJ NN IN NP CD	Fundamental	Basis	Neutral
07-1007_9	The WordNet Domains resource ( ) assigns domain labels to synsets in WordNet	DT NP NP NN ( ) VVZ NN NNS TO NNS IN NP	BackGround	SRelated	Neutral
07-1007_11	Among the few currently available manually sense-annotated corpora for WSD , the SEMCOR (SC) corpus ( ) is the most widely used	IN DT JJ RB JJ RB JJ NNS IN NP , DT NP NN NN ( ) VBZ DT RBS RB VVN	BackGround	GRelated	Neutral
07-1007_12	The DSO corpus ( ) contains 192 ,800 annotated examples for 121 nouns and 70 verbs , drawn from BC and WSJ	DT NP NN ( ) VVZ CD CD VVN NNS IN CD NNS CC CD NNS , VVN IN NP CC NP	BackGround	SRelated	Neutral
07-1007_14	In this section , we describe an EM-based algorithm that was introduced by Saerens et al. ( ) , which can be used to estimate the sense priors , or a priori probabilities of the different senses in a new dataset	IN DT NN , PP VVP DT JJ NN WDT VBD VVN IN NP NP NP ( ) , WDT MD VB VVN TO VV DT NN NNS , CC DT NN NNS IN DT JJ NNS IN DT JJ NN	Fundamental	Basis	Neutral
07-1007_14	Most of this section is based on ( )	JJS IN DT NN VBZ VVN IN ( )	Fundamental	Basis	Neutral
07-1007_15	In applying active learning for domain adaptation , Zhang et al. ( ) presented work on sentence boundary detection using generalized Winnow , while Tur et al. ( ) performed language model adaptation of automatic speech recognition systems	IN VVG JJ NN IN NN NN , NP NP NP ( ) VVN NN IN NN NN NN VVG VVN NP , IN NP NP NP ( ) VVN NN NN NN IN JJ NN NN NNS	BackGround	GRelated	Neutral
07-1008_0	In most contexts , the similarity between chocolate , say , and a narcotic like heroin will mea-gerly reflect the simple ontological fact that both are kinds of substances; certainly , taxonomic measures of similarity as discussed in Budanitsky and Hirst ( ) will capture little more than this commonality	IN JJS NNS , DT NN IN NN , VVP , CC DT NN IN NN MD RB VV DT JJ JJ NN IN/that DT VBP NNS IN NN RB , JJ NNS IN NN IN VVN IN NP CC NP ( ) MD VV RB JJR IN DT NN	BackGround	GRelated	Neutral
07-1008_0	The function (%sim arg 0 CAT) reflects the perceived similarity between the putative member arg 0 and a synset CAT in WordNet , using one of the standard formulations described in Budanitsky and Hirst ( )	DT NN NN NP CD NP VVZ DT VVN NN IN DT JJ NN NP CD CC DT NN NP IN NP , VVG CD IN DT JJ NNS VVN IN NP CC NP ( )	BackGround	SRelated	Neutral
07-1008_2	Whissell ( ) reduces the notion of affect to a single numeric dimension , to produce a dictionary of affect that associates a numeric value in the range 1.0 (most unpleasant) to 3.0 ( )	NP ( ) VVZ DT NN IN NN TO DT JJ JJ NN , TO VV DT NN IN NN IN/that NNS DT JJ NN IN DT NN CD JJ NN TO CD ( )	BackGround	SRelated	Neutral
07-1008_3	We have described an approach that can be seen as a functional equivalent to the CPA (Corpus Pattern Analysis) approach of Pustejovsky et al. ( ) , in which our goal is not that of automated induction of word senses in context (as it is in CPA) but the automated induction of flexible , context-sensitive category structures	PP VHP VVN DT NN WDT MD VB VVN IN DT JJ NN TO DT NP NN NP NP NN IN NP NP NP ( ) , IN WDT PP$ NN VBZ RB IN/that IN JJ NN IN NN NNS IN NN NNS PP VBZ IN NP CC DT JJ NN IN JJ , JJ NN NNS	Fundamental	Basis	Neutral
07-1008_4	Since the line between literal and metaphoric uses of a category is often impossible to draw , the best one can do is to accept metaphor as a gradable phenomenon ( )	IN DT NN IN JJ CC JJ NNS IN DT NN VBZ RB JJ TO VV , DT JJS PP MD VV VBZ TO VV NN IN DT JJ NN ( )	BackGround	SRelated	Neutral
07-1008_5	The most revealing variations are syntagmatic in nature , which is to say , they look beyond individual word forms to larger patterns of contiguous usage ( )	DT RBS JJ NNS VBP JJ IN NN , WDT VBZ TO VV , PP VVP IN JJ NN NNS TO JJR NNS IN JJ NN ( )	BackGround	GRelated	Neutral
07-1008_6	Dice's coefficient ( ) is used to implement this measure	NP NN ( ) VBZ VVN TO VV DT NN	Fundamental	Basis	Neutral
07-1008_7	As noted by De Leenheer and de Moor ( ) , ontologies are lexical representations of concepts , so we can expect the effects of context on language use to closely reflect the effects of context on ontolog-Linguistic variation across contexts is often symp- ical structure	IN VVN IN NP NP CC NP NP ( ) , NNS VBP JJ NNS IN NNS , RB PP MD VV DT NNS IN NN IN NN NN TO RB VV DT NNS IN NN IN JJ NN IN NNS VBZ RB JJ JJ NN	BackGround	SRelated	Neutral
07-1008_8	While simile is a mechanism for highlighting inter-concept similarity , metaphor is at heart a mechanism of category inclusion ( )	IN NN VBZ DT NN IN VVG JJ NN , NN VBZ IN NN DT NN IN NN NN ( )	BackGround	SRelated	Neutral
07-1008_8	Glucksberg ( ) notes that the same category , used figuratively , can exhibit different qualities in different metaphors	NP ( ) VVZ IN/that DT JJ NN , VVN RB , MD VV JJ NNS IN JJ NNS	BackGround	SRelated	Neutral
07-1009_0	In this section , we describe how we use Markov chain Monte Carlo methods to perform inference in the statistical models described in the previous section; Andrieu et al. ( ) provide an excellent introduction to MCMC techniques	IN DT NN , PP VVP WRB PP VVP NP NN NP NP NNS TO VV NN IN DT JJ NNS VVN IN DT JJ NN NP NP NP ( ) VV DT JJ NN TO NP NNS	BackGround	SRelated	Neutral
07-1009_1	These are short statements that restrict the space of languages in a concrete way (for instance "object-verb ordering implies adjective-noun ordering"); Croft ( ) , Hawkins ( ) and Song ( ) provide excellent introductions to linguistic typology	DT VBP JJ NNS WDT VVP DT NN IN NNS IN DT JJ NN NN NN NN VVG VVZ NP NP NP ( ) , NP ( ) CC NP ( ) VV JJ NNS TO JJ NN	BackGround	GRelated	Positive
07-1009_1	This is a well-documented issue (see , eg. , ( )) stemming from the fact that any set of languages is not sampled uniformly from the space of all probable languages	DT VBZ DT JJ NN NN , NP , ( NN VVG IN DT NN IN/that DT NN IN NNS VBZ RB VVN RB IN DT NN IN DT JJ NNS	BackGround	SRelated	Neutral
07-1009_2	The closest work is represented by the books Possible and Probable Languages ( ) and Language Classification by Numbers ( ) , but the focus of these books is on automatically discovering phylogenetic trees for languages based on Indo-European cognate sets ( )	DT JJS NN VBZ VVN IN DT NNS JJ CC JJ NNS ( ) CC NP NP IN NP ( ) , CC DT NN IN DT NNS VBZ IN RB VVG JJ NNS IN NNS VVN IN JJ JJ NNS ( )	BackGround	GRelated	Neutral
07-1009_5	Those that reference Hawkins (eg. , #11) are based on implications described by Hawkins ( ); those that reference Lehmann are references to the principles decided by Lehmann ( ) in Ch 4 & 8	DT WDT NN NP NP , NN VBP VVN IN NNS VVN IN NP ( NN DT IN/that NN NP VBP NNS TO DT NNS VVN IN NP ( ) IN NN CD CC CD	Fundamental	Basis	Neutral
07-1009_9	They have also been used computationally to aid in the learning of unsupervised part of speech taggers ( )	PP VHP RB VBN VVN RB TO VV IN DT NN IN JJ NN IN NN NN ( )	BackGround	GRelated	Neutral
07-1009_10	For instance our #7 is implication #18 from Greenberg , reproduced by Song ( )	IN NN PP$ NN VBZ NN NN IN NP , VVD IN NN ( )	Fundamental	Basis	Neutral
07-1010_0	We examined sentences using a phrase structure parser ( ) and an HPSG parser ( )	PP VVD NNS VVG DT NN NN NN ( ) CC DT NP NN ( )	Fundamental	Basis	Neutral
07-1010_1	Since the number of parameters in NLM is still large , several smoothing methods are used ( ) to produce more accurate probabilities , and to assign nonzero probabilities to any word string	IN DT NN IN NNS IN NP VBZ RB JJ , JJ VVG NNS VBP VVN ( ) TO VV JJR JJ NNS , CC TO VV NN NNS TO DT NN NN	BackGround	GRelated	Neutral
07-1010_2	We would like to see more refined online learning methods with kernels ( ) that we could apply in these areas	PP MD VV TO VV RBR VVN JJ VVG NNS IN NNS ( ) IN/that PP MD VV IN DT NNS	BackGround	MRelated	Neutral
07-1010_3	Therefore we make use of an online learning algorithm proposed by ( ) , which has a much smaller computational cost	RB PP VVP NN IN DT JJ NN NN VVN IN ( ) , WDT VHZ DT RB JJR JJ NN	Fundamental	Basis	Neutral
07-1010_4	Blei , 2003; Wang et al. , 2005) , our result may encourage the study ofthe combination offeatures forlanguage modeling	NP , JJ NP NP NP , JJ , PP$ NN MD VV DT NN NN NN NNS NN NN	NULL	NULL	NULL
07-1010_6	We used a Viterbi decoding ( ) for the partition	PP VVD DT NP VVG ( ) IN DT NN	Fundamental	Basis	Neutral
07-1010_7	Discriminative language models (DLMs) have been proposed to classify sentences directly as correct or incorrect ( ) , and these models can handle both non-local and overlapping information	JJ NN NNS JJ VHP VBN VVN TO VV NNS RB IN JJ CC JJ ( ) , CC DT NNS MD VV DT JJ CC JJ NN	BackGround	GRelated	Neutral
07-1010_8	For fast kernel computation , the Polynomial Kernel Inverted method (PKI)) is proposed ( ) , which is an extension of Inverted Index in Information Retrieval	IN JJ NN NN , DT NP NP NP NN NN VBZ VVN ( ) , WDT VBZ DT NN IN NP NP IN NP NP	BackGround	SRelated	Neutral
07-1010_9	The class model was originally proposed by ( )	DT NN NN VBD RB VVN IN ( )	BackGround	SRelated	Neutral
07-1010_9	However , by considering only those counts that actually change , the algorithm can be made to scale somewhere between linearly and quadratically to the number of classes ( )	RB , IN VVG RB DT NNS WDT RB VVP , DT NN MD VB VVN TO VV RB IN RB CC RB TO DT NN IN NNS ( )	BackGround	SRelated	Neutral
07-1010_12	Recently , Whole Sentence Maximum Entropy Models ( ) (WSMEs) have been introduced	RB , NP NP NP NP NP ( ) NN VHP VBN VVN	BackGround	GRelated	Neutral
07-1010_12	In our experiments , we did not examine the result of using other sampling methods , For example , it would be possible to sample sentences from a whole sentence maximum entropy model ( ) and this is a topic for future research	IN PP$ NNS , PP VVD RB VV DT NN IN VVG JJ NN NNS , IN NN , PP MD VB JJ TO NN NNS IN DT JJ NN NN NN NN ( ) CC DT VBZ DT NN IN JJ NN	BackGround	MRelated	Neutral
07-1010_13	A contrastive estimation method ( ) is similar to ours with regard to constructing pseudo-negative examples	DT JJ NN NN ( ) VBZ JJ TO PP IN NN TO VVG JJ NNS	Fundamental	Idea	Neutral
07-1010_14	If the kernel-trick ( ) is applied to online margin-based learning , a subset of the observed examples , called the active set , needs to be stored	IN DT NN ( ) VBZ VVN TO JJ JJ NN , DT NN IN DT JJ NNS , VVD DT JJ NN , VVZ TO VB VVN	BackGround	SRelated	Neutral
07-1012_1	It should be noted that models based on finite state transducers have been shown to be adequate for describing fusion as well( ) , and further work should evaluate these types of models in ASR of languages with higher indexes of fusion	PP MD VB VVN IN/that NNS VVN IN JJ NN NNS VHP VBN VVN TO VB JJ IN VVG NN IN JJ ) , CC JJR NN MD VV DT NNS IN NNS IN NP IN NNS IN JJR NNS IN NN	BackGround	GRelated	Neutral
07-1012_2	The final approach applies a manually constructed rule-based morphological tagger( )	DT JJ NN VVZ DT RB VVN JJ JJ NN )	Fundamental	Basis	Neutral
07-1012_2	For training the LMs , a subset of 43 million words from the Estonian Segakorpus was used( ) , preprocessed with a morphological analyzer( )	IN VVG DT NP , DT NN IN CD CD NNS IN DT JJ NP VBD NN ) , VVN IN DT JJ NN )	Fundamental	Basis	Neutral
07-1012_2	In ( ) a WER of 44.5% was obtained with word-based trigrams and a WER of 37.2% with items similar to ones from "grammar" using the same speech corpus as in this work	IN ( ) DT NP IN CD VBD VVN IN JJ NNS CC DT NP IN CD IN NNS JJ TO NNS IN NN VVG DT JJ NN NN IN IN DT NN	BackGround	SRelated	Neutral
07-1012_3	It should be noted that every OOV causes roughly two errors in recognition , and vocabulary decomposition approaches such as the ones evaluated here give some benefits to word error rate (WER) even in recognizing languages such as English( )	PP MD VB VVN IN/that DT NP VVZ RB CD NNS IN NN , CC NN NN NNS JJ IN DT NNS VVD RB VV DT NNS TO NN NN NN NN RB IN VVG NNS JJ IN NN )	BackGround	SRelated	Neutral
07-1012_3	This is similar to what was introduced as "flat hybrid model"( ) , and it tries to model OOV-words as sequences of words and fragments	DT VBZ JJ TO WP VBD VVN IN JJ JJ NN ) , CC PP VVZ TO VV NNS IN NNS IN NNS CC NNS	Fundamental	Idea	Neutral
07-1012_3	The results for "hybrid" are in in the range suggested by earlier work( )	DT NNS IN NN VBP IN IN DT NN VVD IN RBR JJ )	BackGround	SRelated	Neutral
07-1012_3	The morph approach was developed for the needs of Finnish speech recognition , which is a high synthesis , moderate fusion and very low orthographic irregularity language , whereas the hybrid approach in ( ) was developed for English , which has low synthesis , moderate fusion , and very high orthographic irregularity	DT NN NN VBD VVN IN DT NNS IN JJ NN NN , WDT VBZ DT JJ NN , JJ NN CC RB JJ JJ NN NN , IN DT JJ NN IN ( ) VBD VVN IN NP , WDT VHZ JJ NN , JJ NN , CC RB JJ JJ NN	BackGround	GRelated	Neutral
07-1012_5	Varigrams( ) are used in this work , and to make LMs trained with each approach comparable , the varigrams have been grown to roughly sizes of 5 million counts	NP ) VBP VVN IN DT NN , CC TO VV NP VVN IN DT NN JJ , DT NNS VHP VBN VVN TO RB NNS IN CD CD NNS	Fundamental	Basis	Neutral
07-1012_5	ing approach , growing varigram models( ) were used with no limits as to the order of n-grams , but limiting the number of counts to 4.8 and 5 million counts	JJ NN , VVG NN NN ) VBD VVN IN DT NNS RB TO DT NN IN NNS , CC VVG DT NN IN NNS TO CD CC CD CD NNS	Fundamental	Basis	Neutral
07-1012_6	For example , in English with language models (LM) of 60k words trained from the Gigaword Corpus V.2( ) , and testing on a very similar Voice of America -portion of TDT4 speech corpora( ) , this gives a OOV rate of 1.5%	IN NN , IN NP IN NN NNS JJ IN JJ NNS VVN IN DT NP NP NP ) , CC VVG IN DT RB JJ NP IN NP NN IN NP NN NN ) , DT VVZ DT NP NN IN CD	BackGround	SRelated	Neutral
07-1013_0	Models of this type have previously been shown to yield very good g2p conversion results ( )	NNS IN DT NN VHP RB VBN VVN TO VV RB JJ NN NN NNS ( )	BackGround	GRelated	Positive
07-1013_1	It has been argued that using morphological information is important for languages where morphology has an important influence on pronunciation , syllabiication and word stress such as German , Dutch , Swedish or , to a smaller extent , also English ( )	PP VHZ VBN VVN IN/that VVG JJ NN VBZ JJ IN NNS WRB NN VHZ DT JJ NN IN NN , NN CC NN NN JJ IN JJ , JJ , JJ CC , TO DT JJR NN , RB JJ ( )	BackGround	GRelated	Neutral
07-1013_1	Decision trees were one of the first data-based approaches to g2p and are still widely used ( )	NN NNS VBD CD IN DT JJ JJ NNS TO NN CC VBP RB RB JJ ( )	BackGround	GRelated	Neutral
07-1013_2	Best results were obtained when using a variant of Modified Kneser-Ney Smoothing 2 ( )	JJS NNS VBD VVN WRB VVG DT NN IN NP NP NP CD ( )	BackGround	SRelated	Positive
07-1013_3	( ) also used a joint n-gram model	( ) RB VVD DT JJ NN NN	BackGround	SRelated	Neutral
07-1013_4	We compared four different state-of-the-art unsu-pervised systems for morphological decomposition (cf. ( ))	PP VVD CD JJ JJ JJ NNS IN JJ NN NN ( NN	Compare	Compare	Neutral
07-1013_5	In very recent work , ( ) developed an unsupervised algorithm (f-meas: 68%; an extension of RePortS) whose segmentations improve g2p when using a the decision tree (PER: 3.45%)	IN RB JJ NN , ( ) VVN DT JJ NN NN JJ DT NN IN NP WP$ NNS VV NN WRB VVG DT DT NN NN NN JJ	BackGround	GRelated	Positive
07-1013_7	The German corpus used in these experiments is CELEX ( )	DT JJ NN VVN IN DT NNS VBZ NP ( )	Fundamental	Basis	Neutral
07-1013_9	Among the unsupervised systems , best results 7 on the g2p task with morphological annotation were obtained with the RePortS system ( )	IN DT JJ NNS , JJS NNS CD IN DT NN NN IN JJ NN VBD VVN IN DT NP NN ( )	Fundamental	Basis	Positive
07-1013_11	The same algorithms have previously been shown to help a speech recognition task ( )	DT JJ NNS VHP RB VBN VVN TO VV DT NN NN NN ( )	BackGround	GRelated	Positive
07-1013_12	The joint n-gram model performs significantly better than the decision tree (essentially based on ( )) , and achieves scores comparable to the Pronunciation by Analogy (PbA) algorithm ( )	DT JJ NN NN VVZ RB JJR IN DT NN NN RB VVN IN ( NN , CC VVZ NNS JJ TO DT NN IN NP JJ NN ( )	Compare	Compare	Positive
07-1013_13	This is much faster than the times for Pronunciation by Analogy (PbA) ( ) on the same corpus	DT VBZ RB RBR IN DT NNS IN NN IN NP NP ( ) IN DT JJ NN	Compare	Compare	Positive
07-1013_14	Examples of such approaches using Hidden Markov Models are ( ) (who applied the HMM to the related task of phoneme-to-grapheme conversion) , ( ) and ( )	NNS IN JJ NNS VVG NP NP NP VBP ( ) NN VVD DT NP TO DT JJ NN IN NN NN , ( ) CC ( )	BackGround	GRelated	Neutral
07-1013_16	For German , ( ) show that information about stress assignment and the position of a syllable within a word improve g2p conversion	IN NP , ( ) VV DT NN IN NN NN CC DT NN IN DT NN IN DT NN VV NN NN	BackGround	GRelated	Neutral
07-1013_17	Vowel length and quality has been argued to also depend on morphological structure ( )	NN NN CC NN VHZ VBN VVN TO RB VV IN JJ NN ( )	BackGround	GRelated	Neutral
07-1013_19	The two rule-based systems we evaluated , the ETI 4 morphological system and SMOR 5 ( ) , are both high-quality systems with large lexica that have been developed over several years	DT CD JJ NNS PP VVD , DT NP CD JJ NN CC NP CD ( ) , VBP DT JJ NNS IN JJ NNS WDT VHP VBN VVN IN JJ NNS	Fundamental	Basis	Positive
07-1013_20	We used the syllabifier described in ( ) , which works similar to the joint n-gram model used for g2p conversion	PP VVD DT NN VVN IN ( ) , WDT VVZ JJ TO DT JJ NN NN VVN IN NN NN	Fundamental	Basis	Neutral
07-1014_0	A possible reason for the observed dichotomy in the behavior of the vowel and consonant inventories with respect to redundancy can be as follows: while the organization of the vowel inventories is known to be governed by a single force - the maximal perceptual contrast ( )) , consonant inventories are shaped by a complex interplay of several forces ( )	DT JJ NN IN DT JJ NN IN DT NN IN DT NN CC NN NNS IN NN TO NN MD VB RB JJ IN DT NN IN DT NN NNS VBZ VVN TO VB VVN IN DT JJ NN : DT JJ JJ NN ( NP , NN NNS VBP VVN IN DT JJ NN IN JJ NNS ( )	BackGround	GRelated	Neutral
07-1014_0	It has been postulated earlier by functional phonologists that such regularities are the consequences of certain general principles like maximal perceptual contrast ( ) , which is desirable between the phonemes of a language for proper perception of each individ-ual phoneme in a noisy environment , ease of articulation ( ) , which requires that the sound systems of all languages are formed of certain universal (and highly frequent) sounds , and ease of learnability( ) , which is necessary for a speaker to learn the sounds of a language with minimum effort	PP VHZ VBN VVN RBR IN JJ NNS IN/that JJ NNS VBP DT NNS IN JJ JJ NNS IN JJ JJ NN ( ) , WDT VBZ JJ IN DT NNS IN DT NN IN JJ NN IN DT JJ NN IN DT JJ NN , VV IN NN ( ) , WDT VVZ IN/that DT JJ NNS IN DT NNS VBP VVN IN JJ JJ NN RB JJ NNS , CC VV IN NN ) , WDT VBZ JJ IN DT NN TO VV DT NNS IN DT NN IN JJ NN	BackGround	GRelated	Neutral
07-1014_1	Such an observation is significant since whether or not these principles are similar/different for the two inventories had been a question giving rise to perennial debate among the past researchers ( )	PDT DT NN VBZ JJ IN IN CC RB DT NNS VBP JJ IN DT CD NNS VHD VBN DT NN VVG NN TO JJ NN IN DT JJ NNS ( )	BackGround	GRelated	Neutral
07-1014_1	On the other hand , in spite of several attempts ( ) the organization of the consonant inventories lacks a satisfactory explanation	IN DT JJ NN , IN NN IN JJ NNS ( ) DT NN IN DT NN NNS VVZ DT JJ NN	BackGround	GRelated	Negative
07-1014_1	Various attempts have been made in the past to explain the aforementioned trends through linguistic insights ( ) mainly establishing their statistical significance	JJ NNS VHP VBN VVN IN DT NN TO VV DT JJ NNS IN JJ NNS ( ) RB VVG PP$ JJ NN	BackGround	GRelated	Neutral
07-1014_3	For instance , in biological systems we find redundancy in the codons ( ) , in the genes ( ) and as well in the proteins ( )	IN NN , IN JJ NNS PP VVP NN IN DT NNS ( ) , IN DT NNS ( ) CC RB RB IN DT NNS ( )	BackGround	GRelated	Neutral
07-1014_4	In fact , the organization of the vowel inventories (especially those with a smaller size) across languages has been satisfactorily explained in terms of the single principle of maximal perceptual contrast ( )	IN NN , DT NN IN DT NN NNS RB DT IN DT JJR NN IN NNS VHZ VBN RB VVN IN NNS IN DT JJ NN IN JJ JJ NN ( )	BackGround	GRelated	Positive
07-1014_5	This redundancy is present mainly to reduce the risk of the complete loss of information that might occur due to accidental errors ( )	DT NN VBZ JJ RB TO VV DT NN IN DT JJ NN IN NN WDT MD VV JJ TO JJ NNS ( )	BackGround	GRelated	Neutral
07-1014_7	Many typological studies ( ) of segmental inventories have been carried out in past on the UCLA Phonological Segment Inventory Database (UPSID) ( )	JJ JJ NNS ( ) IN JJ NNS VHP VBN VVN RP IN NN IN DT NP NP NP NP NP NN ( )	BackGround	GRelated	Neutral
07-1014_11	In order to explain these trends , feature economy was proposed as the organizing principle of the consonant inventories ( )	IN NN TO VV DT NNS , NN NN VBD VVN IN DT VVG NN IN DT NN NNS ( )	BackGround	GRelated	Neutral
07-1014_13	Inspired by the aforementioned studies and the concepts of information theory ( ) we try to quantitatively capture the amount of redundancy found across the consonant Table 1: The table shows four plosives	VVN IN DT JJ NNS CC DT NNS IN NN NN ( ) PP VVP TO RB VV DT NN IN NN VVN IN DT NN NP CD DT NN VVZ CD NNS	Fundamental	Idea	Neutral
07-1014_14	For this purpose , we present an information theoretic definition of redundancy , which is calculated based on the set of features 1 ( ) that are used to express the consonants	IN DT NN , PP VVP DT NN JJ NN IN NN , WDT VBZ VVN VVN IN DT NN IN NNS CD ( ) WDT VBP VVN TO VV DT NNS	Fundamental	Basis	Neutral
07-1014_14	However , one of the earliest observations about the consonant inventories has been that consonants tend to occur in pairs that exhibit strong correlation in terms of their features ( )	RB , CD IN DT JJS NNS IN DT NN NNS VHZ VBN IN/that NNS VVP TO VV IN NNS IN/that NN JJ NN IN NNS IN PP$ NNS ( )	BackGround	GRelated	Neutral
07-1015_0	2 Previous Work Previous work  e.g. ( )  has mostly assumed that one has a training lexicon of transliteration pairs , from which one can learn a model , often a source-channel or MaxEnt-based model	CD JJ NN JJ NN SENT FW ( ) SENT VHZ RB VVN IN/that PP VHZ DT NN NN IN NN NNS , IN WDT PP MD VV DT NN , RB DT NN CC JJ NN	BackGround	GRelated	Neutral
07-1015_1	A linear classifier is trained using the Winnow algorithm from the SNoW toolkit ( )	DT JJ NN VBZ VVN VVG DT NP NN IN DT NP NN ( )	Fundamental	Basis	Neutral
07-1015_1	Using comparable corpora , the named-entities for persons and locations were extracted from the English text; in this paper , the English named-entities were extracted using the named-entity recognizer described in Li et al. ( ) , based on the SNoW machine learning toolkit ( )	VVG JJ NNS , DT NNS IN NNS CC NNS VBD VVN IN DT JJ NN IN DT NN , DT JJ NNS VBD VVN VVG DT NN NN VVN IN NP NP NP ( ) , VVN IN DT NP NN VVG NN ( )	Fundamental	Basis	Neutral
07-1015_3	This is quite small compared to previous approaches such as Knight and Graehl ( ) or Gao et al. ( )	DT VBZ RB JJ VVN TO JJ NNS JJ IN NP CC NP ( ) CC NP NP NP ( )	BackGround	SRelated	Neutral
07-1015_4	Gildea and Jurafsky ( ) counted the number of features whose values are different , and used them as a substitution cost	NP CC NP ( ) VVN DT NN IN NNS WP$ NNS VBP JJ , CC VVD PP IN DT NN NN	BackGround	GRelated	Neutral
07-1015_5	Halle and Clements ( )'s distinctive features are used in order to model the substitution/ insertion/deletion costs for the string-alignment algorithm and linear classifier	NP CC NP ( JJ JJ NNS VBP VVN IN NN TO VV DT JJ NN NNS IN DT NN NN CC JJ NN	Fundamental	Basis	Neutral
07-1015_6	All pronunciations are based on the WorldBet transliteration system ( ) , an ascii-only version of the IPA	DT NNS VBP VVN IN DT NP NN NN ( ) , DT JJ NN IN DT NP	Fundamental	Basis	Neutral
07-1015_9	a.For all the training data , the pairs of pronunciations are aligned using standard string alignment algorithm based on Kruskal ( )	VV PDT DT NN NNS , DT NNS IN NNS VBP VVN VVG JJ NN NN NN VVN IN NP ( )	Fundamental	Basis	Neutral
07-1015_12	For the set of features X and set of weights W , the linear classifier is defined as [1] ( ) X  = { X [ , X2 , ..	IN DT NN IN NNS NP CC VVN IN NNS NP , DT JJ NN VBZ VVN IN JJ ( ) NP SYM ( NP SYM , NP , NP	NULL	NULL	NULL
07-1015_13	In this paper , the phonetic transliteration is performed using the following steps: 1) Generation of the pronunciation for English words and target words: a.Pronunciations for English words are obtained using the Festival text-to-speech system ( )	IN DT NN , DT JJ NN VBZ VVN VVG DT VVG NN JJ NP IN DT NN IN JJ NNS CC NN NN NNS IN JJ NNS VBP VVN VVG DT NP NN NN ( )	Fundamental	Basis	Neutral
07-1015_15	Based on the pronunciation error data of learners of English as a second language as reported in ( ) , we propose the use of what we will term pseudofeatures	VVN IN DT NN NN NNS IN NNS IN NP IN DT JJ NN IN VVN IN ( ) , PP VVP DT NN IN WP PP MD VV NNS	Fundamental	Basis	Neutral
07-1015_16	Examples of the top-3 candidates in the transliteration of English-Korean To evaluate the proposed transliteration methods quantitatively , the Mean Reciprocal Rank (MRR) , a measure commonly used in information retrieval when there is precisely one correct answer ( ) was measured , following Tao and Zhai ( )	NNS IN DT JJ NNS IN DT NN IN NN TO VV DT VVN NN NNS RB , DT NP NP NP NN , DT NN RB VVN IN NN NN WRB EX VBZ RB CD JJ NN ( ) VBD VVN , VVG NP CC NP ( )	Fundamental	Idea	Neutral
07-1015_17	In our work , we adopt the method proposed in ( ) and apply it to the problem of transliteration	IN PP$ NN , PP VVP DT NN VVN IN ( ) CC VV PP TO DT NN IN NN	Fundamental	Basis	Neutral
07-1015_17	The substitution/insertion/deletion cost for the string alignment algorithm is based on the baseline cost from ( )	DT NN NN IN DT NN NN NN VBZ VVN IN DT JJ NN IN ( )	Fundamental	Basis	Neutral
07-1015_17	The pseudo features in this study are same as in Tao et al. ( )	DT JJ NNS IN DT NN VBP JJ IN IN NP NP NP ( )	Fundamental	Idea	Neutral
07-1015_17	MRRs of the phonetic transliteration The baseline was computed using the phonetic transliteration method proposed in Tao et al. ( )	NNS IN DT JJ NN DT NN VBD VVN VVG DT JJ NN NN VVN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1016_0	By treating a letter/character as a word and a group of letters/characters as a phrase or token unit in SMT , one can easily apply the traditional SMT models , such as the IBM generative model ( ) or the phrase-based translation model ( ) to transliteration	IN VVG DT NN IN DT NN CC DT NN IN NNS IN DT NN CC JJ NN IN NP , PP MD RB VV DT JJ NP NNS , JJ IN DT NP JJ NN ( ) CC DT JJ NN NN ( ) TO NN	BackGround	GRelated	Neutral
07-1016_2	In G2P studies , Font Llitjos and Black ( ) showed how knowledge of language of origin may improve conversion accuracy	IN NP NNS , NP NP CC NP ( ) VVD WRB NN IN NN IN NN MD VV NN NN	BackGround	GRelated	Neutral
07-1016_3	Phonetic transliteration can be considered as an extension to the traditional grapheme-to-phoneme (G2P) conversion ( ) , which has been a much-researched topic in the field of speech processing	JJ NN MD VB VVN IN DT NN TO DT JJ NN NN NN ( ) , WDT VHZ VBN DT JJ NN IN DT NN IN NN NN	BackGround	GRelated	Neutral
07-1016_4	Many of the loanwords exist in today's Chinese through semantic transliteration , which has been well received ( ) by the people because of many advantages	JJ IN DT NNS VVP IN NNS JJ IN JJ NN , WDT VHZ VBN RB VVD ( ) IN DT NNS IN IN JJ NNS	BackGround	GRelated	Positive
07-1016_4	Unfortunately semantic transliteration , which is considered as a good tradition in translation practice ( ) , has not been adequately addressed computationally in the literature	RB JJ NN , WDT VBZ VVN IN DT JJ NN IN NN NN ( ) , VHZ RB VBN RB VVN RB IN DT NN	BackGround	GRelated	Negative
07-1016_6	5.S Semantic Transliteration The performance was measured using the Mean Reciprocal Rank (MRR) metric ( ) , a measure that is commonly used in information retrieval , assuming there is precisely one correct answer	NP NP NP DT NN VBD VVN VVG DT NP NP NP JJ JJ ( ) , DT NN WDT VBZ RB VVN IN NN NN , VVG EX VBZ RB CD JJ NN	Fundamental	Basis	Neutral
07-1016_7	In computational linguistic literature , much effort has been devoted to phonetic transliteration , such as English-Arabic , English-Chinese ( ) , English-Japanese ( ) and English-Korean	IN JJ JJ NN , JJ NN VHZ VBN VVN TO JJ NN , JJ IN JJ , JJ ( ) , JJ ( ) CC JJ	BackGround	GRelated	Neutral
07-1016_8	In the extraction of transliterations , data-driven methods are adopted to extract actual transliteration pairs from a corpus , in an effort to construct a large , up-to-date transliteration lexicon ( )	IN DT NN IN NNS , JJ NNS VBP VVN TO VV JJ NN NNS IN DT NN , IN DT NN TO VV DT JJ , JJ NN NN ( )	BackGround	GRelated	Neutral
07-1016_9	The Latin-scripted personal names are always assumed to homogeneously follow the English phonic rules in automatic transliteration ( )	DT JJ JJ NNS VBP RB VVN TO RB VV DT JJ JJ NNS IN JJ NN ( )	BackGround	GRelated	Neutral
07-1016_9	This model is conceptually similar to the joint source-channel model ( ) where the target token t i depends on not only its source token s i but also the history t i-1 and s i -1	DT NN VBZ RB JJ TO DT JJ NN NN ( ) WRB DT NN JJ NN NP VVZ IN RB RB PP$ NN JJ NN NP CC RB DT NN NN JJ CC JJ NP CD	Fundamental	Idea	Neutral
07-1016_10	Some recent work ( ) has attempted to introduce preference into a probabilistic framework for selection of Chinese characters in phonetic transliteration	DT JJ NN ( ) VHZ VVN TO VV NN IN DT JJ NN IN NN IN JJ NNS IN JJ NN	BackGround	GRelated	Neutral
07-1016_11	In transliteration modeling , transliteration rules are trained from a large , bilingual transliteration lexicon ( ) , with the objective of translating unknown words on the fly in an open , general domain	IN NN NN , NN NNS VBP VVN IN DT JJ , JJ NN NN ( ) , IN DT NN IN VVG JJ NNS IN DT NN IN DT JJ , JJ NN	BackGround	GRelated	Neutral
07-1016_15	As discussed elsewhere ( ) , out of several thousand common Chinese characters , a subset of a few hundred characters tends to be used overwhelmingly for transliterating English names to Chinese , e.g.only 731 Chinese characters are adopted in the E-C corpus	RB VVN RB ( ) , RB IN JJ CD JJ JJ NNS , DT NN IN DT JJ CD NNS VVZ TO VB VVN RB IN VVG JJ NNS TO JJ , RB CD JJ NNS VBP VVN IN DT NP NN	BackGround	SRelated	Neutral
07-1016_18	As a Chinese transliteration can arouse to certain connotations , the choice of Chinese characters becomes a topic of interest ( )	IN DT JJ NN MD VV TO JJ NNS , DT NN IN JJ NNS VVZ DT NN IN NN ( )	BackGround	GRelated	Neutral
07-1017_0	Table 4: Lexicon statistics For Arabic , as a full-size Arabic lexicon was not available to us , we used the Buckwalter morphological analyzer ( ) to derive a lexicon	NN CD NP NNS IN NP , IN DT JJ NP NN VBD RB JJ TO PP , PP VVD DT NP JJ NN ( ) TO VV DT NN	Fundamental	Basis	Neutral
07-1017_1	For example , ( ) showed that factored language models , which consider morphological features and use an optimized backoff policy , yield lower perplexity	IN NN , ( ) VVD DT VVN NN NNS , WDT VVP JJ NNS CC VV DT VVN NN NN , VV JJR NN	BackGround	GRelated	Neutral
07-1017_2	A recent work ( ) experimented with English-to-Turkish translation with limited success , suggesting that inflection generation given morphological features may give positive results	DT JJ NN ( ) VVN IN JJ NN IN JJ NN , VVG IN/that NN NN VVN JJ NNS MD VV JJ NNS	BackGround	GRelated	Neutral
07-1017_3	More recently , ( ) achieved improvements in Czech-English MT , optimizing a Table 1: Morphological features used for Russian and Arabic set of possible source transformations , incorporating morphology	RBR RB , ( ) VVN NNS IN JJ NP , VVG DT JJ CD JJ NNS VVN IN NP CC NP VVD IN JJ NN NNS , VVG NN	BackGround	GRelated	Neutral
07-1017_4	Ideally , the best word analysis should be provided as a result of contextual disambiguation (e.g. , ( )); we leave this for future work	RB , DT JJS NN NN MD VB VVN IN DT NN IN JJ NN NN , ( NN PP VVP DT IN JJ NN	BackGround	MRelated	Positive
07-1017_5	Another work ( ) showed improvements by splitting compounds in German	DT NN ( ) VVD NNS IN JJ NNS IN JJ	BackGround	GRelated	Positive
07-1017_6	Translating from a morphology-poor to a morphology-rich language is especially challenging since detailed morphological information needs to be decoded from a language that does not encode this information or does so only implicitly ( )	VVG IN DT NN TO DT JJ NN VBZ RB VVG IN JJ JJ NN VVZ TO VB VVN IN DT NN WDT VVZ RB VV DT NN CC VVZ RB RB RB ( )	BackGround	GRelated	Neutral
07-1017_6	Koehn ( ) includes a survey of statistical MT systems in both directions for the Europarl corpus , and points out the challenges of this task	NP ( ) VVZ DT NN IN JJ NP NNS IN DT NNS IN DT NN NN , CC NNS IN DT NNS IN DT NN	BackGround	GRelated	Neutral
07-1017_7	For example , it has been shown ( ) that determiner segmentation and deletion in Arabic sentences in an Arabic-to-English translation system improves sentence alignment , thus leading to improved overall translation quality	IN NN , PP VHZ VBN VVN ( ) DT NN NN CC NN IN NP NNS IN DT NP NN NN VVZ NN NN , RB VVG TO VVN JJ NN NN	BackGround	GRelated	Positive
07-1017_8	For Arabic , we apply the following heuristic: use the most frequent analysis estimated from the gold standard labels in the Arabic Treebank ( ); if a word does not appear in the treebank , we choose the first analysis returned by the Buckwalter analyzer	IN NP , PP VVP DT VVG NN VV DT RBS JJ NN VVN IN DT JJ JJ NNS IN DT NP NP ( NN IN DT NN VVZ RB VV IN DT NN , PP VVP DT JJ NN VVN IN DT NP NN	Fundamental	Basis	Neutral
07-1017_9	Our learning framework uses a Maximum Entropy Markov model ( )	PP$ VVG NN VVZ DT NP NP NP NN ( )	Fundamental	Basis	Neutral
07-1017_10	( ) demonstrated that a similar level of alignment quality can be achieved with smaller corpora applying morpho-syntactic source restructuring , using hierarchical lexicon models , in translating from German into English	( ) VVN IN/that DT JJ NN IN NN NN MD VB VVN IN JJR NNS VVG JJ NN NN , VVG JJ NN NNS , IN VVG IN JJ IN NP	BackGround	GRelated	Neutral
07-1017_11	The sentence pairs were word-aligned using GIZA++ ( ) and submitted to a treelet-based MT system ( ) , which uses the word dependency structure of the source language and projects word dependency structure to the target language , creating the structure shown in Figure 1 above	DT NN NNS VBD VVN VVG NP ( ) CC VVN TO DT JJ NP NN ( ) , WDT VVZ DT NN NN NN IN DT NN NN CC NNS NN NN NN TO DT NN NN , VVG DT NN VVN IN NP CD RB	Fundamental	Basis	Neutral
07-1017_12	( ) experimented successfully with translating from inflectional languages into English making use of POS tags , word stems and suffixes in the source language	( ) VVN RB IN VVG IN JJ NNS IN JJ VVG NN IN NP NNS , NN VVZ CC VVZ IN DT NN NN	BackGround	GRelated	Positive
07-1017_14	The framework suggested here is most closely related to ( ) , which uses a probabilistic model to generate Japanese case markers for English-to-Japanese MT	DT NN VVD RB VBZ RBS RB VVN TO ( ) , WDT VVZ DT JJ NN TO VV JJ NN NNS IN NP NP	BackGround	SRelated	Neutral
07-1017_15	The algorithm is similar to the one described in ( )	DT NN VBZ JJ TO DT CD VVN IN ( )	Fundamental	Idea	Neutral
07-1018_0	It also differs from now traditional uses of comparable corpora for detecting translation equivalents ( ) or extracting terminology ( ) , which allows a one-to-one correspondence irrespective of the context	PP RB VVZ IN RB JJ NNS IN JJ NNS IN VVG NN NNS ( ) CC VVG NN ( ) , WDT VVZ DT JJ NN RB IN DT NN	Compare	Compare	Neutral
07-1018_1	In the spirit of ( ) , it is intended as a translator's amenuensis "under the tight control of a human translator ..	IN DT NN IN ( ) , PP VBZ VVN IN DT JJ NN NN DT JJ NN IN DT JJ NN NN	Fundamental	Idea	Neutral
07-1018_2	It has been aligned on the sentence level by JAPA ( ) , and further on the word level by GIZA++ ( )	PP VHZ VBN VVN IN DT NN NN IN NP ( ) , CC JJR IN DT NN NN IN NP ( )	Fundamental	Basis	Neutral
07-1018_4	Thus the present system is unlike SMT ( ) , where lexical selection is effected by a translation model based on aligned , parallel corpora , but the novel techniques it has developed are exploitable in the SMT paradigm	RB DT JJ NN VBZ IN NP ( ) , WRB JJ NN VBZ VVN IN DT NN NN VVN IN VVN , JJ NNS , CC DT JJ NNS PP VHZ VVN VBP JJ IN DT NP NN	Compare	Compare	Neutral
07-1018_7	Similarity is measured as the cosine between collocation vectors , whose dimensionality is reduced by SVD using the implementation by Rapp ( )	NN VBZ VVN IN DT NN IN NN NNS , WP$ NN VBZ VVN IN NNS VVG DT NN IN NP ( )	Fundamental	Basis	Neutral
07-1018_8	We have generalised the method used in our previous study ( ) for extracting equivalents for continuous multiword expressions (MWEs)	PP VHP VVN DT NN VVN IN PP$ JJ NN ( ) IN VVG NNS IN JJ NN NNS NN	Fundamental	Basis	Neutral
07-1019_0	Recent efforts in statistical machine translation (MT) have seen promising improvements in output quality , especially the phrase-based models ( ) and syntax-based models ( )	JJ NNS IN JJ NN NN NN VHP VVN JJ NNS IN NN NN , RB DT JJ NNS ( ) CC JJ NNS ( )	BackGround	GRelated	Neutral
07-1019_0	By adapting the k-best parsing Algorithm 2 of Huang and Chiang ( ) , it achieves significant speed-up over full-integration on Chiang's Hiero system	IN VVG DT NN VVG NP CD IN NP CC NP ( ) , PP VVZ JJ NNS IN NN IN NP NP NN	BackGround	GRelated	Positive
07-1019_0	We also devise a faster variant of cube pruning , called cube growing , which uses a lazy version of k-best parsing ( ) that tries to reduce k to the minimum needed at each node to obtain the desired number of hypotheses at the root	NP RB VV DT RBR JJ IN NN VVG , VVD NN NN , WDT VVZ DT JJ NN IN NN VVG ( ) WDT VVZ TO VV NN TO DT NN VVN IN DT NN TO VV DT VVN NN IN NNS IN DT NN	Fundamental	Basis	Neutral
07-1019_0	In a nutshell , cube pruning works on the LM forest , keeping at most k +LM items at each node , and uses the k-best parsing Algorithm 2 of Huang and Chiang ( ) to speed up the computation	IN DT NN , NN VVG NNS IN DT NN NN , VVG IN JJS NN NN NNS IN DT NN , CC VVZ DT NN VVG NP CD IN NP CC NP ( ) TO VV RP DT NN	BackGround	SRelated	Neutral
07-1019_0	This situation is very similar to k-best parsing and we can adapt the Algorithm 2 of Huang and Chiang ( ) here to explore this grid in a best-first order	DT NN VBZ RB JJ TO NN VVG CC PP MD VV DT NP CD IN NP CC NP ( ) RB TO VV DT NN IN DT JJ NN	Fundamental	Basis	Neutral
07-1019_0	This new method , called cube growing , is a lazy version of cube pruning just as Algorithm 3 of Huang and Chiang ( ) , is a lazy version of Algorithm 2 (see Table 1)	DT JJ NN , VVD NN VVG , VBZ DT JJ NN IN NN VVG RB IN NP CD IN NP CC NP ( ) , VBZ DT JJ NN IN NP CD NN JJ NN	Fundamental	Idea	Neutral
07-1019_1	The different target sides then constitute a third dimension of the grid , forming a cube of possible combinations ( )	DT JJ NN NNS RB VVP DT JJ NN IN DT NN , VVG DT NN IN JJ NNS ( )	BackGround	SRelated	Neutral
07-1019_2	The data set is same as in Section 5.1 , except that we also parsed the English-side using a variant of the Collins ( ) parser , and then extracted 24.7M tree-to-string rules using the algorithm of ( )	DT NNS NN VBZ JJ RB IN NP CD , IN WDT PP RB VVD DT NN VVG DT NN IN DT NP ( ) NN , CC RB VVN JJ NN NNS VVG DT NN IN ( )	Fundamental	Basis	Neutral
07-1019_2	These forest rescoring algorithms have potential applications to other computationally intensive tasks involving combinations of different models , for example , head-lexicalized parsing ( ); joint parsing and semantic role labeling ( ); or tagging and parsing with nonlocal features	DT NN NN NNS VHP JJ NNS TO JJ RB JJ NNS VVG NNS IN JJ NNS , IN NN , VVN VVG ( JJ NN VVG CC JJ NN VVG ( NN CC VVG CC VVG IN JJ NNS	BackGround	MRelated	Neutral
07-1019_3	In tree-to-string (also called syntax-directed) decoding ( ) , the source string is first parsed into a tree , which is then recursively converted into a target string according to transfer rules in a synchronous grammar ( )	IN NN NN VVD NN VVG ( ) , DT NN NN VBZ RB VVN IN DT NN , WDT VBZ RB RB VVN IN DT NN NN VVG TO VV NNS IN DT JJ NN ( )	BackGround	GRelated	Neutral
07-1019_4	We generalize cube pruning and adapt it to two systems very different from Hiero: a phrase-based system similar to Pharaoh ( ) and a tree-to-string system ( )	NP VV NN VVG CC VV PP TO CD NNS RB JJ IN NP DT JJ NN JJ TO NN ( ) CC DT NN NN ( )	Fundamental	Basis	Neutral
07-1019_5	We test our methods on two large-scale English-to-Chinese translation systems: a phrase-based system and our tree-to-string system ( )	PP VVP PP$ NNS IN CD JJ JJ NN NN DT JJ NN CC PP$ NN NN ( )	Fundamental	Basis	Neutral
07-1019_5	Our data preparation follows Huang et al. ( ): the training data is a parallel corpus of28.3M words on the English side , and a trigram language model is trained on the Chinese side	PP$ NN NN VVZ NP NP NP ( NN DT NN NN VBZ DT JJ NN NN NNS IN DT JJ NN , CC DT NN NN NN VBZ VVN IN DT JJ NN	Fundamental	Idea	Neutral
07-1019_5	For cube growing , we use a non-duplicate k-best method ( ) to get 100-best unique translations according to  LM to estimate the lower-bound heuristics	IN NN VVG , PP VVP DT JJ NN NN ( ) TO VV JJ JJ NNS VVG TO SENT NP TO VV DT JJ NN	Fundamental	Basis	Neutral
07-1019_5	Since our tree-to-string rules may have many variables , we first binarize each hyperedge in the forest on the target projection ( )	IN PP$ NN NNS MD VH JJ NNS , PP RB VVP DT NN IN DT NN IN DT NN NN ( )	Fundamental	Basis	Neutral
07-1019_7	Thus we envision forest rescoring as being of general applicability for reducing complicated search spaces , as an alternative to simulated annealing methods ( )	RB PP VVP NN NN IN VBG IN JJ NN IN VVG JJ NN NNS , IN DT NN TO JJ VVG NNS ( )	BackGround	MRelated	Neutral
07-1019_8	Part of the complexity arises from the expressive power of the translation model: for example , a phrase- or word-based model with full reordering has exponential complexity ( )	NN IN DT NN VVZ IN DT JJ NN IN DT NN NN IN NN , DT NN CC JJ NN IN JJ VVG VHZ JJ NN ( )	BackGround	GRelated	Neutral
07-1019_9	We will use the following example from Chinese to English for both systems described in this section: yU  Shalong juxing le huitan with Sharon hold   [past] meeting 'held a meeting with Sharon' A typical phrase-based decoder generates partial target-language outputs in left-to-right order in the form of hypotheses ( )	PP MD VV DT VVG NN IN NP TO NP IN DT NNS VVN IN DT JJ NN NN VVG DT NN IN NP VVP JJ NN NN DT NN IN NP DT JJ JJ NN VVZ JJ NN NNS IN JJ NN IN DT NN IN NNS ( )	BackGround	GRelated	Neutral
07-1019_9	We implemented Cubit , a Python clone of the Pharaoh decoder ( ) , 3 and adapted cube pruning to it as follows	PP VVD NP , DT NP NN IN DT NN NN ( ) , CD CC VVN NN VVG TO PP RB VVZ	Fundamental	Basis	Neutral
07-1019_9	We set the decoder phrase-table limit to 100 as suggested in ( ) and the distortion limit to 4	PP VVD DT NN JJ NN TO CD IN VVN IN ( ) CC DT NN NN TO CD	Fundamental	Idea	Neutral
07-1019_10	An SCFG ( ) is a context-free rewriting system for generating string pairs	DT NP ( ) VBZ DT JJ VVG NN IN VVG NN NNS	BackGround	SRelated	Neutral
07-1019_12	To integrate with a bigram language model , we can use the dynamic-programming algorithms of Och and Ney ( ) and Wu ( ) for phrase-based and SCFG-based systems , respectively , which we may think of as doing a iner-grained version of the deductions above	TO VV IN DT NN NN NN , PP MD VV DT JJ NNS IN NP CC NP ( ) CC NP ( ) IN JJ CC JJ NNS , RB , WDT PP MD VV IN RB VVG DT JJ NN IN DT NNS RB	BackGround	SRelated	Neutral
07-1019_12	The language model also , if fully integrated into the decoder , introduces an expensive overhead for maintaining target-language boundary words for dynamic programming ( )	DT NN NN RB , IN RB VVN IN DT NN , VVZ DT JJ NN IN VVG NN NN NNS IN JJ NN ( )	BackGround	GRelated	Neutral
07-1019_13	Similarly , the decoding problem with SCFGs can also be cast as a deductive (parsing) system ( )	RB , DT VVG NN IN NP MD RB VB VVN IN DT JJ NN NN ( )	BackGround	SRelated	Neutral
07-1019_14	However , the hope is that by choosing the right value ofi , these estimates will be accurate enough to affect the search quality only slightly , which is analogous to "almost admissible" heuristics in A* search ( )	RB , DT NN VBZ IN/that IN VVG DT JJ NN NNS , DT NNS MD VB JJ RB TO VV DT NN NN RB RB , WDT VBZ JJ TO JJ JJ NNS IN JJ NN ( )	BackGround	SRelated	Neutral
07-1020_0	A few exceptions are the hierarchical (possibly syntax-based) trans-duction models ( ) and the string transduction models ( )	DT JJ NNS VBP DT JJ JJ JJ NN NNS ( ) CC DT NN NN NNS ( )	BackGround	GRelated	Neutral
07-1020_1	The SFST approach described here is similar to the one described in ( ) which has subsequently been adopted by ( )	DT NP NN VVD RB VBZ JJ TO DT CD VVN IN ( ) WDT VHZ RB VBN VVN IN ( )	Fundamental	Idea	Neutral
07-1020_2	In preliminary experiments , we have associated the target lexical items with supertag information ( )	IN JJ NNS , PP VHP VVN DT NN JJ NNS IN NN NN ( )	Fundamental	Basis	Neutral
07-1020_4	We separate the most popular classification techniques into two broad categories: also called Maxent as it finds the distribution with maximum entropy that properly estimates the average of each feature over the training data ( )	PP VVP DT RBS JJ NN NNS IN CD JJ NN RB VVD NP IN PP VVZ DT NN IN JJ NN WDT RB VVZ DT NN IN DT NN IN DT NN NNS ( )	NULL	NULL	NULL
07-1020_5	Most of the previous work on statistical machine translation , as exemplified in ( ) , employs word-alignment algorithm (such as GIZA++ ( )) that provides local associations between source and target words	JJS IN DT JJ NN IN JJ NN NN , RB VVN IN ( ) , VVZ NN NN NN IN NP ( NN WDT VVZ JJ NNS IN NN CC NN NNS	BackGround	GRelated	Neutral
07-1020_7	The BOW approach is different from the parsing based approaches ( ) where the translation model tightly couples the syntactic and lexical items of the two languages	DT NP NN VBZ JJ IN DT VVG VVN NNS ( ) WRB DT NN NN RB VVZ DT JJ CC JJ NNS IN DT CD NNS	Compare	Compare	Neutral
07-1020_8	The excellent results recently obtained with the SEARN algorithm ( ) also suggest that binary classifiers , when properly trained and combined , seem to be capable ofmatching more complex structured output approaches	DT JJ NNS RB VVN IN DT NP NN ( ) RB VVP IN/that JJ NNS , WRB RB VVN CC VVN , VVP TO VB JJ NN RBR JJ JJ NN NNS	BackGround	GRelated	Positive
07-1020_9	A new L1-regularized Maxent algorithms was proposed for density estimation ( ) and we adapted it to classification	DT JJ JJ NP NNS VBD VVN IN NN NN ( ) CC PP VVD PP TO NN	Fundamental	Basis	Neutral
07-1020_10	From the bilanguage corpus B , we train an n-gram language model using standard tools ( )	IN DT NN NN NN , PP VVP DT NN NN NN VVG JJ NNS ( )	Fundamental	Basis	Neutral
07-1020_11	The use of supertags in phrase-based SMT system has been shown to improve results ( )	DT NN IN NNS IN JJ NP NN VHZ VBN VVN TO VV NNS ( )	BackGround	GRelated	Positive
07-1020_12	here: all state hypotheses of a whole sentence are kept in memory) , it is necessary to either use heuristic forward pruning or constrain permutations to be within a local window of adjustable size (also see ( ))	NN DT NN NNS IN DT JJ NN VBP VVN IN NN , PP VBZ JJ TO RB VV JJ RB VVG CC VV NNS TO VB IN DT JJ NN IN JJ NN NN VV ( NN	BackGround	SRelated	Neutral
07-1020_13	Although Conditional Random Fields (CRF) ( ) train an exponential model at the sequence level , in translation tasks such as ours the computational requirements of training such models are prohibitively expensive	IN NP NP NP NN ( ) VV DT JJ NN IN DT NN NN , IN NN NNS JJ IN PP DT JJ NNS IN VVG JJ NNS VBP RB JJ	BackGround	SRelated	Neutral
07-1020_14	We found this algorithm to converge faster than the current state-of-the-art in Maxent training , which is L2-regularized L-BFGS ( ) 1	PP VVD DT NN TO VV RBR IN DT JJ JJ IN NP NN , WDT VBZ NP NP ( ) CD	Compare	Compare	Positive
07-1020_16	Discriminative training has been used mainly for translation model combination ( ) and with the exception of ( ) , has not been used to directly train parameters of a translation model	JJ NN VHZ VBN VVN RB IN NN NN NN ( ) CC IN DT NN IN ( ) , VHZ RB VBN VVN TO RB VV NNS IN DT NN NN	BackGround	GRelated	Neutral
07-1020_17	For the work reported in this paper , we have used the GIZA++ tool ( ) which implements a string-alignment algorithm	IN DT NN VVD IN DT NN , PP VHP VVN DT NP NN ( ) WDT VVZ DT NN NN	Fundamental	Basis	Neutral
07-1020_18	Each output label t is projected into a bit string with components b j (t) where probability of each component is estimated independently: In practice , despite the approximation , the 1-vs-other scheme has been shown to perform as well as the multiclass scheme ( )	DT NN NN NN VBZ VVN IN DT NN NN IN NNS SYM NN NN WRB NN IN DT NN VBZ VVN NN IN NN , IN DT NN , DT JJ NN VHZ VBN VVN TO VV RB RB IN DT JJ NN ( )	BackGround	SRelated	Neutral
07-1020_24	For the Hansard corpus we used the same training and test split as in ( ): 1.4 million training sentence pairs and 5432 test sentences	IN DT NN NN PP VVD DT JJ NN CC NN NN IN IN ( NN CD CD NN NN NNS CC CD NN NNS	Fundamental	Idea	Neutral
07-1021_0	In search of a balance between structural flexibility and computational complexity , several authors have proposed constraints to identify classes of non-projec-tive dependency structures that are computationally well-behaved ( )	IN NN IN DT NN IN JJ NN CC JJ NN , JJ NNS VHP VVN NNS TO VV NNS IN JJ NN NNS WDT VBP RB JJ ( )	BackGround	GRelated	Positive
07-1021_0	This result generalizes previous work on the relation between ltag and dependency representations ( )	DT NN VVZ JJ NN IN DT NN IN NN CC NN NNS ( )	BackGround	SRelated	Neutral
07-1021_0	? The encoding of dependency structures as order-annotated trees allows us to reformulate two constraints on non-projectivity originally defined on fully specified dependency structures ( ) in terms of syntactic properties of the order annotations that they induce: Gap-degree The gap-degree of a dependency structure is the maximum over the number of discontinuities in any yield of that structure	SENT DT VVG IN NN NNS IN JJ NNS VVZ PP TO VV CD NNS IN NN RB VVN IN RB VVN NN NNS ( ) IN NNS IN JJ NNS IN DT NN NNS WDT PP JJ NP DT NN IN DT NN NN VBZ DT NN IN DT NN IN NNS IN DT NN IN DT NN	Fundamental	Basis	Neutral
07-1021_0	This enables us to generalize a previous result on the class of dependency structures generated by lexicalized tags ( ) to the class of generated dependency languages , LTAL	DT VVZ PP TO VV DT JJ NN IN DT NN IN NN NNS VVN IN JJ NNS ( ) TO DT NN IN VVN NN NNS , NP	Fundamental	Basis	Neutral
07-1021_1	Lately , they have also been used in many computational tasks , such as relation extraction ( ) , parsing ( ) , and machine translation ( )	RB , PP VHP RB VBN VVN IN JJ JJ NNS , JJ IN NN NN ( ) , VVG ( ) , CC NN NN ( )	BackGround	GRelated	Neutral
07-1021_2	Unfortunately , most formal results on non-projectivity are discouraging: While grammar-driven dependency parsers that are restricted to projective structures can be as efficient as parsers for lexicalized context-free grammar ( ) , parsing is prohibitively expensive when unrestricted forms of non-projectivity are permitted ( )	RB , RBS JJ NNS IN NN VBP JJ IN JJ NN NNS WDT VBP VVN TO JJ NNS MD VB RB JJ IN NNS IN JJ JJ NN ( ) , VVG VBZ RB JJ WRB JJ NNS IN NN VBP VVN ( )	BackGround	GRelated	Negative
07-1021_4	We also show that adding the well-nestedness condition corresponds to the restriction of lcfrs to Coupled Context-Free Grammars ( ) , and that regular sets of well-nested structures with a gap-degree of at most 1 are exactly the class of sets of derivations of Lexicalized Tree Adjoining Grammar (ltag)	PP RB VVP IN/that VVG DT NN NN VVZ TO DT NN IN NNS TO NP NP NP ( ) , CC IN/that JJ NNS IN JJ NNS IN DT NN IN IN JJS CD VBP RB DT NN IN NNS IN NNS IN NP NP NP NP NN	Fundamental	Basis	Neutral
07-1021_4	This restriction is central to the formalism of Coupled-Context-Free Grammar (ccfg) ( )	DT NN VBZ JJ TO DT NN IN NP NP NN ( )	BackGround	SRelated	Neutral
07-1021_4	REGD w(k) = LCCFL(k + 1) As a special case , Coupled-Context-Free Grammars with fan-out 2 are equivalent to Tree Adjoining Grammars (tags) ( )	JJ NN SYM NP SYM JJ IN DT JJ NN , NP NP IN JJ CD VBP JJ TO NP NP NP NN ( )	BackGround	SRelated	Neutral
07-1021_5	This gives rise to a notion of regular dependency languages , and allows us to establish a formal relation between the structural constraints and mildly context-sensitive grammar formalisms ( ): We show that regular dependency languages correspond to the sets of derivations of lexicalized Linear Context-Free Rewriting Systems (lcfrs) ( ) , and that the gap-degree measure is the structural correspondent of the concept of 'fan-out' in this formalism ( )	DT VVZ NN TO DT NN IN JJ NN NNS , CC VVZ PP TO VV DT JJ NN IN DT JJ NNS CC RB JJ NN NNS ( NN PP VVP IN/that JJ NN NNS VV TO DT NNS IN NNS IN JJ NP NP NP NP NN ( ) , CC IN/that DT NN NN VBZ DT JJ NN IN DT NN IN NP IN DT NN ( )	Fundamental	Basis	Neutral
07-1021_6	Such a comparison may be empirically more adequate than one based on traditional notions of generative capacity ( )	PDT DT NN MD VB RB RBR JJ IN CD VVN IN JJ NNS IN JJ NN ( )	BackGround	MRelated	Positive
07-1021_7	Both constraints have been shown to be in very good fit with data from dependency treebanks ( )	DT NNS VHP VBN VVN TO VB IN RB JJ NN IN NNS IN NN NNS ( )	BackGround	GRelated	Positive
07-1021_7	A dependency structure is projective , if each of its yields forms an interval with respect to the precedence order ( )	DT NN NN VBZ JJ , IN DT IN PP$ NNS VVZ DT NN IN NN TO DT NN NN ( )	BackGround	SRelated	Neutral
07-1021_8	Data-driven dependency parsing with non-projective structures is quadratic when all attachment decisions are assumed to be independent of one another ( ) , but becomes intractable when this assumption is abandoned ( )	JJ NN VVG IN JJ NNS VBZ JJ WRB DT NN NNS VBP VVN TO VB JJ IN CD DT ( ) , CC VVZ JJ WRB DT NN VBZ JJ ( )	BackGround	GRelated	Neutral
07-1021_14	The number of components in the order-annotation , and hence , the gap-degree of the resulting dependency language , corresponds to the fan-out of the function: the highest number of components among the arguments of the function ( )	DT NN IN NNS IN DT NN , CC RB , DT NN IN DT VVG NN NN , VVZ TO DT NN IN DT NN DT JJS NN IN NNS IN DT NNS IN DT NN ( )	BackGround	SRelated	Neutral
07-1021_16	Linear Context-Free Rewriting Systems Gap-restricted dependency languages are closely related to Linear Context-Free Rewriting Systems (lcfrs) ( ) , a class of formal systems that generalizes several mildly context-sensitive grammar formalisms	NP NP NP NPS NP NN NNS VBP RB VVN TO NP NP NP NP NN ( ) , DT NN IN JJ NNS WDT VVZ JJ RB JJ NN NNS	BackGround	SRelated	Neutral
07-1022_0	The Unfold-Fold transformation is a calculus for transforming functional and logic programs into equivalent but (hopefully) faster programs ( )	DT NN NN VBZ DT NN IN VVG JJ CC NN NNS IN JJ CC JJ JJR NNS ( )	BackGround	SRelated	Neutral
07-1022_1	Standard methods for converting weighted CFGs to equivalent PCFGs can be used if required ( )	JJ NNS IN VVG JJ NNS TO JJ NNS MD VB VVN IN JJ ( )	BackGround	SRelated	Neutral
07-1022_2	Second , Eisner-Satta O(n 3) PBDG parsing algorithms are extremely fast ( )	RB , NP NP JJ NN VVG NNS VBP RB JJ ( )	BackGround	GRelated	Positive
07-1022_2	It is straight-forward to extend the split-head CFG to encode the additional state information required by the head automata of Eisner and Satta ( ); this corresponds to splitting the non-terminals L u and uR	PP VBZ JJ TO VV DT NN NN TO VV DT JJ NN NN VVN IN DT NN NN IN NP CC NP ( NN DT VVZ TO NN DT NP NP NN CC NN	BackGround	SRelated	Positive
07-1022_2	The O(n 3) split-head grammar is closely related to the O(n 3) PBDG parsing algorithm given by Eisner and Satta ( )	DT NP JJ NN NN VBZ RB VVN TO DT NP JJ NN VVG NN VVN IN NP CC NP ( )	BackGround	SRelated	Neutral
07-1022_5	Goodman ( ) observed that the Viterbi parse is in general not the optimal parse for evaluation metrics such as f-score that are based on the number of correct constituents in a parse	NP ( ) VVD IN/that DT NP VVP VBZ RB JJ RB DT JJ VVP IN NN NNS JJ IN NNS WDT VBP VVN IN DT NN IN JJ NNS IN DT VVP	BackGround	SRelated	Neutral
07-1022_6	For example , incremental CFG parsing algorithms can be used with the CFGs produced by this transform , as can the Inside-Outside estimation algorithm ( ) and more exotic methods such as estimating adjoined hidden states ( )	IN NN , JJ NP VVG NNS MD VB VVN IN DT NNS VVN IN DT VV , RB MD DT NP NN NN ( ) CC JJR JJ NNS JJ IN VVG VVN JJ NNS ( )	BackGround	GRelated	Neutral
07-1022_8	The closest related work we are aware of is McAllester ( ) , which also describes a reduction of PBDGs to efficiently-parsable CFGs and directly inspired this work	DT JJS JJ NN PP VBP JJ IN VBZ NP ( ) , WDT RB VVZ DT NN IN NNS TO JJ NNS CC RB VVD DT NN	BackGround	SRelated	Positive
07-1022_9	This paper investigates the relationship between Context-Free Grammar (CFG) parsing and the Eis-ner/Satta PBDG parsing algorithms , including their extension to second-order PBDG parsing ( )	DT NN VVZ DT NN IN NP NP NN VVG CC DT NP NP VVG NNS , VVG PP$ NN TO NN NN VVG ( )	Fundamental	Basis	Neutral
07-1022_10	First , because they capture bilexical head-to-head dependencies they are capable of producing extremely high-quality parses: state-of-the-art dis-criminatively trained PBDG parsers rival the accuracy of the very best statistical parsers available today ( )	RB , IN PP VV JJ NN NNS PP VBP JJ IN VVG RB JJ JJ JJ NN VVN NP NNS VV DT NN IN DT RB JJS JJ NNS JJ NN ( )	BackGround	GRelated	Positive
07-1022_10	The steps involved in CKY parsing with this grammar correspond closely to those of the McDonald ( ) second-order PBDG parsing algorithm	DT NNS VVN IN NP VVG IN DT NN VV RB TO DT IN DT NP ( ) NN NN VVG NN	Fundamental	Basis	Neutral
07-1022_10	These weights are estimated by an online procedure as in McDonald ( ) , and are not intended to define a probability distribution	DT NNS VBP VVN IN DT JJ NN IN IN NP ( ) , CC VBP RB VVN TO VV DT NN NN	Fundamental	Idea	Neutral
07-1022_10	We provided one grammar which captures horizontal second-order dependencies ( ) , and another which captures vertical second-order head-to-head-to-head dependencies	PP VVD CD NN WDT VVZ JJ NN NNS ( ) , CC DT WDT VVZ JJ NN NN NNS	Fundamental	Basis	Neutral
07-1022_11	Since CFGs can be expressed as Horn-clause logic programs ( ) and the Unfold-Fold transformation is provably correct for such programs ( ) , it follows that its application to CFGs is provably correct as well	IN NP MD VB VVN IN NP NN NNS ( ) CC DT NN NN VBZ RB JJ IN JJ NNS ( ) , PP VVZ IN/that PP$ NN TO NP VBZ RB JJ IN RB	BackGround	GRelated	Neutral
07-1022_15	Specifically , we show how to use an off-line preprocessing step , the Unfold-Fold transformation , to transform a PBDG into an equivalent CFG that can be parsed in O(n 3) time using a version of the CKY algorithm with suitable indexing ( ) , and extend this transformation so that it captures second-order PBDG dependencies as well	RB , PP VVP WRB TO VV DT NN NN NN , DT NN NN , TO VV DT NN IN DT JJ NN WDT MD VB VVN IN NP JJ NN VVG DT NN IN DT NP NN IN JJ NN ( ) , CC VV DT NN RB IN/that PP VVZ NN NN NNS RB RB	Fundamental	Basis	Neutral
07-1023_0	By a slight generalization of a result by Aoto ( ) , this typing r h N' : a must be negatively non-duplicated in the sense that each atomic type has at most one negative occurrence in it	IN DT JJ NN IN DT NN IN NP ( ) , DT VVG NN NN NP : DT MD VB RB JJ IN DT NN IN/that DT JJ NN VHZ IN JJS CD JJ NN IN PP	Fundamental	Basis	Neutral
07-1023_1	By Aoto and Ono's ( ) generalization of the Coherence Theorem ( ) , it follows that every /-term P such that r' h P : a for some r' c r must be Sr-equal to N' (and consequently to N)	IN NP CC NP ( ) NN IN DT NP NP ( ) , PP VVZ IN/that DT NN NN PDT DT NP NN NN : DT IN DT NN SYM NN MD VB JJ TO NP NN RB TO NP	BackGround	SRelated	Neutral
07-1023_2	The reduction to Datalog makes it possible to apply to parsing and generation sophisticated evaluation techniques for Datalog queries; in particular , an application of generalized supplementary magic-sets rewriting ( ) automatically yields Earley-style algorithms for both parsing and generation	DT NN TO NP VVZ PP JJ TO VV TO VVG CC NN JJ NN NNS IN NP NN IN JJ , DT NN IN VVN JJ NNS VVG ( ) RB VVZ JJ NNS IN DT VVG CC NN	BackGround	SRelated	Neutral
07-1023_3	(In the case of an IO macro grammar , the result is an IO context-free tree grammar ( ).) String copying becomes tree copying , and the resulting grammar can be represented by an almost linear CFLG and hence by a Datalog program	NN DT NN IN DT NP NN NN , DT NN VBZ DT NP JJ NN NN ( JJ NP NN VVZ NN NN , CC DT VVG NN MD VB VVN IN DT RB JJ NN CC RB IN DT NP NN	BackGround	SRelated	Neutral
07-1023_4	With regard to parsing and recognition of input strings , polynomial-time algorithms and the LOGCFL upper bound on the computational complexity are already known for the grammar formalisms covered by our results ( ); nevertheless , we believe that our reduction to Datalog offers valuable insights	IN NN TO VVG CC NN IN NN NNS , JJ NNS CC DT NP JJ VVN IN DT JJ NN VBP RB VVN IN DT NN NNS VVN IN PP$ NNS ( NN RB , PP VVP IN/that PP$ NN TO NP VVZ JJ NNS	BackGround	SRelated	Neutral
07-1023_5	In this paper , we show that a similar reduction to Datalog is possible for more powerful grammar formalisms with "context-free" derivations , such as (multi-component) tree-adjoining grammars ( ) , IO macro grammars ( ) , and (parallel) multiple context-free grammars ( )	IN DT NN , PP VVP IN/that DT JJ NN TO NP VBZ JJ IN RBR JJ NN NNS IN JJ NNS , JJ IN JJ NN NNS ( ) , NP NN NNS ( ) , CC JJ JJ JJ NNS ( )	BackGround	GRelated	Positive
07-1023_6	By the main result of Gottlob et al. ( ) , the related search problem of finding one derivation tree for the input /-term is in functional LOGCFL , i.e. , the class of functions that can be computed by a logspace-bounded Turing machine with a LOGCFL oracle	IN DT JJ NN IN NP NP NP ( ) , DT JJ NN NN IN VVG CD NN NN IN DT NN NN VBZ IN JJ NN , FW , DT NN IN NNS WDT MD VB VVN IN DT JJ NP NN IN DT NP NN	BackGround	SRelated	Neutral
07-1023_7	Our method essentially relies on the encoding of different formalisms in terms of abstract categorial grammars ( )	PP$ NN RB VVZ IN DT VVG IN JJ NNS IN NNS IN JJ JJ NNS ( )	Fundamental	Basis	Neutral
07-1023_7	What we have called a context-free / -term grammar is nothing but an alternative notation for an abstract categorial grammar ( ) whose abstract vocabulary is second-order , with the restriction to linear /-terms removed	WP PP VHP VVN DT NN SYM NN NN VBZ NN CC DT JJ NN IN DT JJ JJ NN ( ) WP$ JJ NN VBZ NN , IN DT NN TO JJ NNS VVN	BackGround	SRelated	Neutral
07-1023_7	A string-generating grammar coupled with Montague semantics may be represented by a synchronous CFLG , a pair of CFLGs with matching rule sets ( )	DT VVG NN VVN IN NP NNS MD VB VVN IN DT JJ NP , DT NN IN NP IN VVG NN NNS ( )	BackGround	SRelated	Neutral
07-1023_8	linear ACGs are known to be expressive enough to encode well-known mildly context-sensitive grammar formalisms in a straightforward way , including TAGs and multiple context-free grammars ( )	JJ NNS VBP VVN TO VB JJ RB TO VV JJ RB JJ NN NNS IN DT JJ NN , VVG NNS CC JJ JJ NNS ( )	BackGround	GRelated	Positive
07-1023_8	For example , the linear CFLG in Figure 8 is an encoding of the TAG in Figure 3 , where a(S) = o>o and a(A) = (o > o) > o  > o ( )	IN NN , DT JJ NN IN NP CD VBZ DT VVG IN DT NP IN NP CD , WRB NN SYM NN CC NN SYM NN NN NN NN NN SENT SYM NN ( )	BackGround	SRelated	Neutral
07-1023_12	We can eliminate e-rules from an almost linear CFLG by the same method that Kanazawa and Yoshinaka ( ) used for linear grammars , noting that for any r and a , there are only finitely many almost linear /-terms M such that r h M : a.If a grammar has no e-rule , any derivation tree for the input /-term N that has a /-term P at its root node corresponds to a Datalog derivation tree whose number of leaves is equal to the number of occurrences of constants in P , which cannot exceed the number of occurrences of constants in N	PP MD VV NP IN DT RB JJ NN IN DT JJ NN IN/that NP CC NP ( ) VVN IN JJ NNS , VVG IN/that IN DT NN CC DT , EX VBP RB RB JJ RB JJ NNS NP PDT DT NN NN NP : NN DT NN VHZ DT NP , DT NN NN IN DT NN NN NP WDT VHZ DT JJ NN IN PP$ NN NN VVZ TO DT NP NN NN WP$ NN IN NNS VBZ JJ TO DT NN IN NNS IN NNS IN NN , WDT MD VV DT NN IN NNS IN NNS IN NP	Fundamental	Idea	Neutral
07-1023_13	For such P and D , it is known that {(D , q) | D eD , P U D derives q } is in the complexity class LOGCFL ( )	IN JJ NN CC NP , PP VBZ VVN IN/that NNS , NN SYM NP NNS , NN NP NP VVZ NN ) VBZ IN DT NN NN NP ( )	BackGround	SRelated	Neutral
07-1023_16	3 In the linear case , Salvati ( ) has shown the recognition/parsing complexity to be PTIME , and exhibited an algorithm similar to Earley parsing for TAGs	CD IN DT JJ NN , NP ( ) VHZ VVN DT VVG NN TO VB JJ , CC VVN DT NN JJ TO NP VVG IN NP	BackGround	GRelated	Neutral
07-1023_19	The result of the generalized supplementary magic-sets rewriting of Beeri and Ramakrish-nan ( ) applied to the Datalog program representing a CFG essentially coincides with the deduction system ( ) or uninstantiated parsing system ( ) for Earley parsing	DT NN IN DT VVN JJ NNS VVG IN NP CC NP ( ) VVN TO DT NP NN VVG DT NP RB VVZ IN DT NN NN ( ) CC JJ VVG NN ( ) IN NP VVG	BackGround	GRelated	Neutral
07-1023_22	By naive (or seminaive) bottom-up evaluation ( ) , the answer to such a query can be computed in polynomial time in the size of the database for any Datalog program	IN JJ NN NN JJ NN ( ) , DT NN TO PDT DT NN MD VB VVN IN JJ NN IN DT NN IN DT NN IN DT NP NN	BackGround	GRelated	Neutral
07-1023_23	We illustrate this approach with the program in Figure 4 , following the presentation of Ullman ( )	PP VVP DT NN IN DT NN IN NP CD , VVG DT NN IN NP ( )	Fundamental	Idea	Neutral
07-1024_0	But there are also other factors involved - for example , the tendency to put "given" discourse elements before "new" ones , which has been shown to play a role independent of length ( )	CC EX VBP RB JJ NNS VVN : IN NN , DT NN TO VV JJ NN NNS IN JJ NNS , WDT VHZ VBN VVN TO VV DT NN JJ IN NN ( )	BackGround	GRelated	Neutral
07-1024_1	First , how close is dependency length in English to that of this optimal DLA? Secondly , how similar is the optimal DLA to English in terms of the actual rules that arise? Finding linear arrangements of graphs that minimize total edge length is a classic problem , NP-complete for general graphs but with an O(n 16) algorithm for trees ( )	RB , WRB NN VBZ NN NN IN NP TO DT IN DT JJ NN RB , WRB JJ VBZ DT JJ NP TO NP IN NNS IN DT JJ NNS IN/that NN VVG JJ NNS IN NNS WDT VV JJ NN NN VBZ DT JJ NN , NP IN JJ NNS CC IN DT NP JJ NN IN NNS ( )	BackGround	SRelated	Neutral
07-1024_2	Statistical parsers make use of features that capture dependency length (e.g.an adjacency feature in Collins ( ) , more explicit length features in McDonald et al. ( ) and Eisner and Smith ( )) and thus learn to favor parses with shorter dependencies	JJ NNS VVP NN IN NNS WDT VV NN NN NN NN NN IN NP ( ) , JJR JJ NN NNS IN NP NP NP ( ) CC NP CC NP ( NN CC RB VV TO VV VVZ IN JJR NNS	BackGround	GRelated	Neutral
07-1024_2	We take sentences from the Wall Street Journal section of the Penn Treebank , extract the dependency trees using the head-word rules of Collins ( ) , consider them to be unordered dependency trees , and linearize them to minimize dependency length	PP VVP NNS IN DT NP NP NP NN IN DT NP NP , VV DT NN NNS VVG DT NN NNS IN NP ( ) , VV PP TO VB JJ NN NNS , CC VV PP TO VV NN NN	Fundamental	Basis	Neutral
07-1024_3	Exactly this pattern has been observed by Dryer ( ) in natural languages	RB DT NN VHZ VBN VVN IN NP ( ) IN JJ NNS	BackGround	SRelated	Neutral
07-1024_5	Frazier ( ) suggests that this might serve the function of keeping heads and dependents close together	NP ( ) VVZ IN/that DT MD VV DT NN IN VVG NNS CC NNS RB RB	BackGround	SRelated	Neutral
07-1024_6	This has been offered as an explanation for numerous psycholinguistic phenomena , such as the greater processing difficulty of object relative clauses versus subject relative clauses ( )	DT VHZ VBN VVN IN DT NN IN JJ JJ NNS , JJ IN DT JJR NN NN IN NN JJ NNS CC JJ JJ NNS ( )	BackGround	GRelated	Neutral
07-1024_7	Hawkins ( ) has shown that this principle is reflected in grammatical rules across many languages	NP ( ) VHZ VVN IN/that DT NN VBZ VVN IN JJ NNS IN JJ NNS	BackGround	GRelated	Neutral
07-1024_7	One might suppose that such syntactic choices in English are guided at least partly by dependency length minimization , and indeed there is evidence for this; for example , people tend to put the shorter of two PPs closer to the verb ( )	PP MD VV DT JJ JJ NNS IN NP VBP VVN IN JJS RB IN NN NN NN , CC RB EX VBZ NN IN NN IN NN , NNS VVP TO VV DT JJR IN CD NNS JJR TO DT NN ( )	BackGround	GRelated	Neutral
07-1024_8	The problem of finding the optimum weighted DLA for a set of input trees can be shown to be NP-complete by reducing from the problem of finding a graph's minimum Feedback Arc Set , one of the 21 classic problems of Karp ( )	DT NN IN VVG DT JJ JJ NP IN DT NN IN NN NNS MD VB VVN TO VB JJ IN VVG IN DT NN IN VVG DT NNS JJ NP NN NN , CD IN DT CD JJ NNS IN NP ( )	BackGround	GRelated	Neutral
07-1024_10	This setting is reminiscent of the problem of optimizing feature weights for reranking of candidate machine translation outputs , and we employ an optimization technique similar to that used by Och ( ) for machine translation	DT NN VBZ JJ IN DT NN IN VVG NN NNS IN NN IN NN NN NN NNS , CC PP VVP DT NN NN JJ TO DT VVN IN NP ( ) IN NN NN	Fundamental	Idea	Neutral
07-1025_0	In particular , our approach would be applicable to corpora with frame-specific role labels , e.g.FrameNet ( )	IN JJ , PP$ NN MD VB JJ TO NNS IN JJ NN NNS , NN ( )	BackGround	MRelated	Neutral
07-1025_1	Our work suggests that feature generalization based on verb-similarity may compliment approaches to generalization based on role-similarity ( )	PP$ NN VVZ IN/that NN NN VVN IN NN MD NN NNS TO NN VVN IN NN ( )	BackGround	MRelated	Neutral
07-1025_2	For this task we utilized the August 2005 release of the Charniak parser with the default speed/accuracy settings ( ) , which required roughly 360 hours of processor time on a 2.5 GHz PowerPC G5	IN DT NN PP VVD DT NP CD NN IN DT NP NN IN DT NN NN NNS ( ) , WDT VVD RB CD NNS IN NN NN IN DT CD NP NP NP	Fundamental	Basis	Neutral
07-1025_3	To automatically identify all verb inflections , we utilized the English DELA electronic dictionary ( ) , which contained all but 21 of the PropBank verbs (for which we provided the inflections ourselves) , with old-English verb inflections removed	TO RB VV DT NN NNS , PP VVD DT NP NP JJ NN ( ) , WDT VVD DT CC CD IN DT NP NNS NN WDT PP VVD DT NNS NN , IN JJ NN NNS VVN	Fundamental	Basis	Neutral
07-1025_4	Parse tree paths were used for semantic role labeling by Gildea and Jurafsky ( ) as descriptive features of the syntactic relationship between predicates and their arguments in the parse tree of a sentence	JJ NN NNS VBD VVN IN JJ NN VVG IN NP CC NP ( ) IN JJ NNS IN DT JJ NN IN NNS CC PP$ NNS IN DT VVP NN IN DT NN	BackGround	GRelated	Neutral
07-1025_5	In future work , it would be particularly interesting to compare empirically-derived verb clusters to verb classes derived from theoretical considerations ( ) , and to the automated verb classification techniques that use these classes ( )	IN JJ NN , PP MD VB RB JJ TO VV JJ NN NNS TO NN NNS VVN IN JJ NNS ( ) , CC TO DT JJ NN NN NNS WDT VVP DT NNS ( )	BackGround	MRelated	Neutral
07-1025_7	Our approach is analogous to previous work in extracting collocations from large text corpora using syntactic information ( )	PP$ NN VBZ JJ TO JJ NN IN VVG NNS IN JJ NN NNS VVG JJ NN ( )	Fundamental	Idea	Neutral
07-1025_7	This observation further supports the distributional hypothesis of word similarity and corresponding technologies for identifying synonyms by similarity of lexical-syntactic context ( )	DT NN RBR VVZ DT JJ NN IN NN NN CC JJ NNS IN VVG NNS IN NN IN JJ NN ( )	BackGround	SRelated	Neutral
07-1025_8	In our work , we utilized the GigaWord corpus of English newswire text ( ) , consisting of nearly 12 gigabytes of textual data	IN PP$ NN , PP VVD DT NP NN IN JJ NN NN ( ) , VVG IN RB CD NNS IN JJ NNS	Fundamental	Basis	Neutral
07-1025_9	Annotations similar to these have been used to create automated semantic role labeling systems ( ) for use in natural language processing applications that require only shallow semantic parsing	NNS JJ TO DT VHP VBN VVN TO VV JJ JJ NN VVG NNS ( ) IN NN IN JJ NN NN NNS WDT VVP RB JJ JJ VVG	BackGround	GRelated	Neutral
07-1025_9	The overall performance of our semantic role labeling approach is not competitive with leading contemporary systems , which typically employ support vector machine learning algorithms with syntactic features ( ) or syntactic tree kernels ( )	DT JJ NN IN PP$ JJ NN VVG NN VBZ RB JJ IN VVG JJ NNS , WDT RB VVP NN NN NN VVG NNS IN JJ NNS ( ) CC JJ NN NNS ( )	Compare	Compare	Positive
07-1025_10	A recent release of the PropBank ( ) corpus of semantic role annotations of Tree-bank parses contained 112 ,917 labeled instances of 4 ,250 rolesets corresponding to 3 ,257 verbs , as illustrated by this example for the verb buy	DT JJ NN IN DT NP ( ) NN IN JJ NN NNS IN NP VVZ VVN CD CD VVN NNS IN CD CD NNS JJ TO CD CD NNS , RB VVN IN DT NN IN DT NN NN	BackGround	GRelated	Neutral
07-1025_11	An important area for future research will be to explore the correlation between our distance metric for syntactic similarity and various quantitative measures of semantic similarity ( )	DT JJ NN IN JJ NN MD VB TO VV DT NN IN PP$ NN JJ IN JJ NN CC JJ JJ NNS IN JJ NN ( )	BackGround	MRelated	Positive
07-1025_13	To prepare this corpus for analysis , we extracted the body text from each of the 4.1 million entries in the corpus and applied a maximum-entropy algorithm to identify sentence boundaries ( )	TO VV DT NN IN NN , PP VVD DT NN NN IN DT IN DT CD CD NNS IN DT NN CC VVD DT NN NN TO VV NN NNS ( )	Fundamental	Basis	Neutral
07-1026_0	Feature-based Methods for SRL: most features used in prior SRL research are generally extended from Gildea and Jurafsky ( ) , who used a linear interpolation method and extracted basic flat features from a parse tree to identify and classify the constituents in the FrameNet ( )	JJ NNS IN NN JJS NNS VVN IN JJ NP NN VBP RB VVN IN NP CC NP ( ) , WP VVD DT JJ NN NN CC VVD JJ JJ NNS IN DT VVP NN TO VV CC VV DT NNS IN DT NP ( )	BackGround	GRelated	Neutral
07-1026_1	SVM ( ) is selected as our classifier and the one vs	NP ( ) VBZ VVN IN PP$ NN CC DT CD NP	Fundamental	Basis	Neutral
07-1026_2	In the context of it , more and more kernels for restricted syntaxes or specific domains ( ) are proposed and explored in the NLP domain	IN DT NN IN PP , JJR CC JJR NNS IN JJ NNS CC JJ NNS ( ) VBP VVN CC VVN IN DT NP NN	BackGround	GRelated	Neutral
07-1027_0	In this paper , we apply Alternating Structure Optimization (ASO) ( ) to the semantic role labeling task on NomBank	IN DT NN , PP VVP NP NP NP NN ( ) TO DT JJ NN VVG NN IN NP	Fundamental	Basis	Neutral
07-1027_0	ASO has been shown to be effective on the following natural language processing tasks: text categorization , named entity recognition , part-of-speech tagging , and word sense disambiguation ( )	NP VHZ VBN VVN TO VB JJ IN DT VVG JJ NN VVG JJ NN NN , VVN NN NN , NN VVG , CC NN NN NN ( )	BackGround	GRelated	Positive
07-1027_0	For a more complete description , see ( )	IN DT RBR JJ NN , VVP ( )	BackGround	SRelated	Neutral
07-1027_0	In this work , we use a modification of Huber's robust loss function , similar to that used in ( ): L(p , y) 4py if py < 1 (1  py) 2 if  1 py< 1 (2) 0 if py > 1 We fix the regularization parameter A to 10 -4 , similar to that used in ( )	IN DT NN , PP VVP DT NN IN NP JJ NN NN , JJ TO DT VVN IN ( NP NP , JJ NN IN NNS SYM JJ NN SENT NN CD IN SENT CD NN CD NN CD IN NNS SYM LS PP VV DT NN NN NP TO CD CD , JJ TO DT VVN IN ( )	Fundamental	Idea	Neutral
07-1027_0	This relationship is modeled by + 6 Tvi (3) The parameters [{w l , vi} , 6] may then be found by joint empirical risk minimization over all the m problems , i.e. , their values should minimize the combined empirical risk: l=i  V   i=l J (4) An important observation in ( ) is that the binary classification problems used to derive 6 are not necessarily those problems we are aiming to solve	DT NN VBZ VVN RB SYM CD NP NN DT NNS NN NN , JJ , JJ MD RB VB VVN IN JJ JJ NN NN IN PDT DT NN NNS , FW , PP$ NNS MD VV DT JJ JJ NN NN NN NN NP NN DT JJ NN IN ( ) VBZ IN/that DT JJ NN NNS VVN TO VV CD VBP RB RB DT NNS PP VBP VVG TO VV	BackGround	SRelated	Positive
07-1027_0	Assuming there are k target problems and m auxiliary problems , it is shown in ( ) that by performing one round of minimization , an approximate solution of 6 can be obtained from (4) by the following algorithm: 1.For each of the m auxiliary problems , learn u l as described by (1)	VVG EX VBP NN NN NNS CC NN JJ NNS , PP VBZ VVN IN ( ) IN/that IN VVG CD NN IN NN , DT JJ NN IN CD MD VB VVN IN NN IN DT VVG NN NN DT IN DT NN JJ NNS , VV NN NN IN VVN IN NN	BackGround	SRelated	Neutral
07-1027_0	This is a simplified version of the definition in ( ) , made possible because the same A is used for all auxiliary problems	DT VBZ DT VVN NN IN DT NN IN ( ) , VVD JJ IN DT JJ NP VBZ VVN IN DT JJ NNS	BackGround	SRelated	Neutral
07-1027_0	ASO has been demonstrated to be an effective semi-supervised learning algorithm ( )	NP VHZ VBN VVN TO VB DT JJ JJ NN NN ( )	BackGround	GRelated	Positive
07-1027_0	A variety of auxiliary problems are tested in ( ) in the semi-supervised settings , i.e. , their auxiliary problems are generated from unlabeled data	DT NN IN JJ NNS VBP VVN IN ( ) IN DT JJ NNS , FW , PP$ JJ NNS VBP VVN IN JJ NNS	BackGround	GRelated	Neutral
07-1027_2	More recently , for the word sense disambiguation (WSD) task , ( ) experimented with both supervised and semi-supervised auxiliary problems , although the auxiliary problems she used are different from ours	RBR RB , IN DT NN NN NN NN NN , ( ) VVN IN DT JJ CC JJ JJ NNS , IN DT JJ NNS PP VVD VBP JJ IN PP	BackGround	GRelated	Neutral
07-1027_3	In recent years , the availability of large human-labeled corpora such as PropBank ( ) and FrameNet ( ) has made possible a statistical approach of identifying and classifying the arguments of verbs in natural language texts	IN JJ NNS , DT NN IN JJ JJ NNS JJ IN NP ( ) CC NP ( ) VHZ VVN JJ DT JJ NN IN VVG CC VVG DT NNS IN NNS IN JJ NN NNS	BackGround	GRelated	Neutral
07-1027_4	This is known as multi-task learning in the machine learning literature ( )	DT VBZ VVN IN NN VVG IN DT NN VVG NN ( )	BackGround	GRelated	Neutral
07-1027_5	A large number of SRL systems have been evaluated and compared on the standard data set in the CoNLL shared tasks ( ) , and many systems have performed reasonably well	DT JJ NN IN NP NNS VHP VBN VVN CC VVN IN DT JJ NNS VVN IN DT NP JJ NNS ( ) , CC JJ NNS VHP VVN RB RB	BackGround	GRelated	Neutral
07-1027_7	 In addition to the target outputs , ( ) discusses configurations where both used inputs and unused inputs (due to excessive noise) are utilized as additional outputs	IN NN TO DT NN NNS , ( ) VVZ NNS WRB DT VVN NNS CC JJ NNS NN TO JJ NN VBP VVN IN JJ NNS	BackGround	GRelated	Neutral
07-1027_8	First , we train the various classifiers on sections 2 to 21 using gold argument labels and automatic parse trees produced by Charniak's re-ranking parser ( ) , and test them on section 23 with automatic parse trees	RB , PP VVP DT JJ NNS IN NNS CD TO CD VVG JJ NN NNS CC JJ VVP NNS VVN IN NP NN NN ( ) , CC VV PP IN NN CD IN JJ VVP NNS	Fundamental	Basis	Neutral
07-1027_10	Noun predicates also appear in FrameNet semantic role labeling ( ) , and many FrameNet SRL systems are evaluated in Senseval-3 ( )	NP VVZ RB VV IN NP JJ NN VVG ( ) , CC JJ NP NP NNS VBP VVN IN NP ( )	BackGround	GRelated	Neutral
07-1027_11	So far we are aware of only one English NomBank-based SRL system ( ) , which uses the maximum entropy classifier , although similar efforts are reported on the Chinese NomBank by ( ) and on FrameNet by ( ) using a small set of hand-selected nominalizations	RB RB PP VBP JJ IN RB CD NNS JJ NP NN ( ) , WDT VVZ DT JJ NN NN , IN JJ NNS VBP VVN IN DT JJ NN IN ( ) CC IN NP IN ( ) VVG DT JJ NN IN JJ NNS	BackGround	GRelated	Neutral
07-1027_11	Second , we achieve accuracy higher than that reported in ( ) and advance the state of the art in SRL research	RB , PP VVP NN JJR IN DT VVN IN ( ) CC VV DT NN IN DT NN IN NP NN	Compare	Compare	Negative
07-1027_11	Eighteen baseline features and six additional features are proposed in ( ) for NomBank argument identification	CD JJ NNS CC CD JJ NNS VBP VVN IN ( ) IN NP NN NN	Fundamental	Basis	Neutral
07-1027_11	Unlike in ( ) , we do not prune arguments dominated by other arguments or those that overlap with the predicate in the training data	IN IN ( ) , PP VVP RB VV NNS VVN IN JJ NNS CC DT WDT VVP IN DT NN IN DT NN NNS	Compare	Compare	Neutral
07-1027_11	The J&N column presents the result reported in ( ) using both baseline and additional features	DT NP NN VVZ DT NN VVN IN ( ) VVG DT NN CC JJ NNS	BackGround	SRelated	Neutral
07-1027_11	A diverse set of 28 features is used in ( ) for argument classification	DT JJ NN IN CD NNS VBZ VVN IN ( ) IN NN NN	Fundamental	Basis	Neutral
07-1027_11	To find a smaller set of effective features , we start with all the features considered in ( ) , in ( ) , and various combinations of them , for a total of 52 features	TO VV DT JJR NN IN JJ NNS , PP VVP IN PDT DT NNS VVN IN ( ) , RB ( ) , CC JJ NNS IN PP , IN DT NN IN CD NNS	Fundamental	Basis	Neutral
07-1027_11	The J&N column presents the result reported in ( )	DT NP NN VVZ DT NN VVN IN ( )	BackGround	SRelated	Neutral
07-1027_11	This is the same configuration as reported in ( )	DT VBZ DT JJ NN IN VVN IN ( )	Fundamental	Idea	Neutral
07-1027_11	Table 3: Fl scores of various classifiers on NomBank SRL Our maximum entropy classifier consistently outperforms ( ) , which also uses a maximum entropy classifier	NN CD NP NNS IN JJ NNS IN NP NP PP$ JJ NN NN RB VVZ ( ) , WDT RB VVZ DT JJ NN NN	Compare	Compare	Negative
07-1027_11	Our results outperform those reported in ( )	PP$ NNS VVP DT VVN IN ( )	Compare	Compare	Negative
07-1027_14	With the recent release of NomBank ( ) , it becomes possible to apply machine learning techniques to the task	IN DT JJ NN IN NP ( ) , PP VVZ JJ TO VV NN VVG NNS TO DT NN	BackGround	GRelated	Neutral
07-1027_19	Accordingly , we do not maximize the probability of the entire labeled parse tree as in ( )	RB , PP VVP RB VV DT NN IN DT NN VVD VV NN IN IN ( )	Fundamental	Idea	Neutral
07-1028_0	Some approaches have used WordNet for the generalization step ( ) , others EM-based clustering ( )	DT NNS VHP VVN NP IN DT NN NN ( ) , NNS JJ VVG ( )	BackGround	GRelated	Neutral
07-1028_1	The argument positions for which we compute selec-tional preferences will be semantic roles in the FrameNet ( ) paradigm , and the predicates we consider will be semantic classes of words rather than individual words (which means that different preferences will be learned for different senses of a predicate word)	DT NN NNS IN WDT PP VV JJ NNS MD VB JJ NNS IN DT NP ( ) NN , CC DT NNS PP VVP MD VB JJ NNS IN NNS RB IN JJ NNS NN VVZ IN/that JJ NNS MD VB VVN IN JJ NNS IN DT JJ NN	Fundamental	Basis	Neutral
07-1028_1	We use FrameNet ( ) , a semantic lexicon for English that groups words in semantic classes called frames and lists semantic roles for each frame	PP VVP NP ( ) , DT JJ NN IN NP IN/that NNS NNS IN JJ NNS VVD NNS CC VVZ JJ NNS IN DT NN	Fundamental	Basis	Neutral
07-1028_2	Brockmann and Lapata ( ) perform a comparison of WordNet-based models	NP CC NP ( ) VV DT NN IN JJ NNS	BackGround	GRelated	Neutral
07-1028_3	The sim function can equally well be in-c)stantiated with a WordNet-based metric (for an overview see Budanitsky and Hirst ( )) , but we restrict our experiments to corpus-based metrics (a) in the interest of greatest possible sim cosine(w , w')   =    ,-      p     ,=    sim Dice(w ,w')   =  ip/  Miiip)  a V  '    7^/E r p  f(w ,r p ) 2YE r p  f (w' ,r p )  2  Dic ^ V  '    7 \  R(w) \  +  \  R(w' ) \ sim  Lj n(w ,w')= ^  p  n - tt   i v  sim i accarH(w ,w') =   p  ) \i  ,  a slm nindie(w ,w /)   =    r  p sim Hindie(w , w' ,r p) where sim Hindle(w , w' , r p)  =   ^   abs(max(1 (w ,r p ) ,1 (w' ,r p)))   if I(w ,r p) <  0 and I (w' ,rp) <  0 Table 1: Similarity measures used resource-independence and (b) in order to be able to shape the similarity metric by the choice of generalization corpus	DT NN NN MD RB RB VB JJ IN DT JJ JJ NN DT NN VV NP CC NP ( NP , CC PP VVP PP$ NNS TO JJ NNS JJ IN DT NN IN JJS JJ NN NNS , NN SYM NN NN SENT JJ NNS JJ NN SYM JJ NP DT NN POS JJ NN NN NN NN NN ) JJ NN NN SYM NP NN NN ) JJ NP SYM NN POS NP SYM NP SYM SYM SYM NP ) SYM NP NP NN NN NN NN NN : NN SENT NP NP NP NP JJ NN SYM NN ) NN NN DT NN NN JJ NN SYM SYM NN NN NN , NP NP NN WRB NN NNS , NP , NN NN SYM SYM JJ NN NN NN ) CD NP NN NN IN NP NP NP SYM CD CC NP NP NP SYM CD JJ CD NP NNS VVD NN CC NN IN NN TO VB JJ TO VV DT NN JJ IN DT NN IN NN NN	BackGround	SRelated	Neutral
07-1028_4	In SRL , the two most pressing issues today are (1) the development of strong semantic features to complement the current mostly syntactically-based systems , and (2) the problem of the domain dependence ( )	IN NP , DT CD RBS JJ NNS NN VBP JJ DT NN IN JJ JJ NNS TO VV DT JJ RB JJ NNS , CC NN DT NN IN DT NN NN ( )	BackGround	GRelated	Neutral
07-1028_5	The preference that r p has for a given synset co , the selectional association between the two , is then defined as the contribution of c 0 to r p's selectional preference strength: A(rp ,C 0) = P (C 0|r p)log  ^gf* S (rp Further WordNet-based approaches to selec-tional preference induction include Clark and Weir ( ) , and Abe and Li ( )	DT NN IN/that NN NN VHZ IN DT VVN NN NN , DT JJ NN IN DT CD , VBZ RB VVN IN DT NN IN NN CD TO NN NNS JJ NN NN NP JJ NN SYM NN NN NN NN NN NP NP RBR JJ NNS TO JJ NN NN VVP NP CC NP ( ) , CC NP CC NP ( )	BackGround	GRelated	Neutral
07-1028_6	To determine headwords of the semantic roles , the corpus was parsed using the Collins ( ) parser	TO VV NNS IN DT JJ NNS , DT NN VBD VVN VVG DT NP ( ) NN	Fundamental	Basis	Neutral
07-1028_7	5x2cv ( )	NP ( )	NULL	NULL	NULL		
07-1028_8	They have been used for example for syntactic disambiguation ( ) , word sense disambiguation (WSD) ( ) and semantic role labeling (SRL) ( )	PP VHP VBN VVN IN NN IN JJ NN ( ) , NN NN NN NN ( ) CC JJ NN VVG NN ( )	BackGround	GRelated	Neutral
07-1028_8	While EM-based models have been shown to work better in SRL tasks ( ) , this has been attributed to the difference in coverage	IN JJ NNS VHP VBN VVN TO VV JJR IN NP NNS ( ) , DT VHZ VBN VVN TO DT NN IN NN	BackGround	GRelated	Neutral
07-1028_10	We will be using the similarity metrics shown in Table 1: Cosine , the Dice and Jaccard coefficients , and Hindle's ( ) and Lin's ( ) mutual information-based metrics	PP MD VB VVG DT NN NNS VVN IN NP CD NP , DT NP CC NP NNS , CC NP ( ) CC NP ( ) JJ JJ NNS	Fundamental	Basis	Neutral
07-1028_11	Selectional restrictions and selectional preferences that predicates impose on their arguments have long been used in semantic theories , (see e.g. ( ))	JJ NNS CC JJ NNS IN/that NNS VV IN PP$ NNS VHP RB VBN VVN IN JJ NNS , NP FW ( JJ	BackGround	GRelated	Neutral
07-1028_12	It was parsed using Minipar ( ) , which is considerably faster than the Collins parser but failed to parse about a third of all sentences	PP VBD VVN VVG NP ( ) , WDT VBZ RB RBR IN DT NP NN CC VVD TO VV IN DT JJ IN DT NNS	Fundamental	Basis	Positive
07-1028_13	In this paper we propose a new , simple model for selectional preference induction that uses corpus-based semantic similarity metrics , such as Cosine or Lin's ( ) mutual information-based metric , for the generalization step	IN DT NN PP VVP DT JJ , JJ NN IN JJ NN NN WDT VVZ JJ JJ NN NNS , JJ IN NP CC NP ( ) JJ JJ NN , IN DT NN NN	Fundamental	Basis	Neutral
07-1028_15	The corpus-based induction of selectional preferences was first proposed by Resnik ( )	DT JJ NN IN JJ NNS VBD RB VVN IN NP ( )	BackGround	GRelated	Neutral
07-1028_15	The induction of selectional preferences from corpus data was pioneered by Resnik ( )	DT NN IN JJ NNS IN NN NNS VBD VVD IN NP ( )	BackGround	GRelated	Positive
07-1028_16	Rooth et al. ( ) generalize over seen headwords using EM-based clustering rather than WordNet	NP NP NP ( ) VV IN VVN NNS VVG JJ VVG RB IN NP	BackGround	GRelated	Neutral
07-1028_16	Experimental design Like Rooth et al. ( ) we evaluate selectional preference induction approaches in a pseudo-disambiguation task	JJ NN IN NP NP NP ( ) PP VVP JJ NN NN NNS IN DT NN NN	Fundamental	Idea	Neutral
07-1029_0	The intuition that "hard to learn" examples are suspect corpus errors is not new , and appears also in Abney et al. ( ) , who consider the "heaviest" samples in the final distribution of the AdaBoost algorithm to be the hardest to classify and thus likely corpus errors	DT NN IN/that NN TO JJ NNS VBP JJ NN NNS VBZ RB JJ , CC VVZ RB IN NP NP NP ( ) , WP VVP DT JJ NNS IN DT JJ NN IN DT NP NN TO VB DT RBS TO VV CC RB JJ NN NNS	BackGround	SRelated	Neutral
07-1029_1	The HEB  Err version of the corpus is obtained by projecting the chunk boundaries on the sequence of PoS and morphology tags obtained by the automatic PoS tagger of Adler & Elhadad ( )	DT NP NP NN IN DT NN VBZ VVN IN VVG DT NN NNS IN DT NN IN NP CC NN NNS VVN IN DT JJ NP NN IN NP CC NP ( )	Fundamental	Basis	Neutral
07-1029_2	We tested this hypothesis by training the Error-Driven Pruning (EDP) method of ( ) with an extended set of features	PP VVD DT NN IN VVG DT NP NP JJ NN IN ( ) IN DT JJ NN IN NNS	Fundamental	Basis	Neutral
07-1029_4	In ( ) , we established that the task is not trivially transferable to Hebrew , but reported that SVM based chunking ( ) performs well	IN ( ) , PP VVD IN/that DT NN VBZ RB RB JJ TO NN , CC VVD IN/that NP VVN NN ( ) VVZ RB	BackGround	SRelated	Positive
07-1029_4	In ( ) we argued that it is not applicable to Hebrew , mainly because of the prevalence of the Hebrew's construct state (smixut)	IN ( ) PP VVD IN/that PP VBZ RB JJ TO NN , RB IN IN DT NN IN DT NNS VV NN NN	BackGround	GRelated	Neutral
07-1029_4	For the Hebrew experiments , we use the corpora of ( )	IN DT JJ NNS , PP VVP DT NNS IN ( )	Fundamental	Basis	Neutral
07-1029_4	These are the same settings as in ( )	DT VBP DT JJ NNS IN IN ( )	Fundamental	Idea	Neutral
07-1029_4	Refining the SimpleNP Definition: The hard cases analysis identified examples that challenge the SimpleNP definition proposed in Goldberg et al. ( )	VVG DT NP NP DT JJ NNS NN VVD NNS WDT VVP DT NP NN VVN IN NP NP NP ( )	BackGround	SRelated	Neutral
07-1029_5	Kudo and Matsumoto ( ) used SVM as a classification engine and achieved an F-Score of 93.79 on the shared task NPs	NP CC NP ( ) VVN NN IN DT NN NN CC VVD DT NN IN CD IN DT JJ NN NP	BackGround	GRelated	Neutral
07-1029_5	Further details can be found in Kudo and Matsumoto ( )	JJR NNS MD VB VVN IN NP CC NP ( )	BackGround	SRelated	Neutral
07-1029_7	Following Ramshaw and Marcus ( ) , the current dominant approach is formulating chunking as a classification task , in which each word is classified as the (B)eginning , (I)nside or (O)outside of a chunk	VVG NP CC NP ( ) , DT JJ JJ NN VBZ VVG NN IN DT NN NN , IN WDT DT NN VBZ VVN IN DT NN , NN CC NN IN DT NN	Fundamental	Idea	Neutral
07-1029_7	NP chunks in the shared task data are BaseNPs , which are non-recursive NPs , a definition first proposed by Ramshaw and Marcus ( )	NN NNS IN DT VVN NN NNS VBP NP , WDT VBP JJ NP , DT NN RB VVN IN NP CC NP ( )	BackGround	GRelated	Neutral
07-1029_7	For the English experiments , we use the now-standard training and test sets that were introduced in ( ) 2	IN DT JJ NNS , PP VVP DT JJ NN CC NN NNS WDT VBD VVN IN ( ) LS	Fundamental	Basis	Neutral
07-1029_8	This method is similar to the corpus error detection method presented by Nakagawa and Matsumoto ( )	DT NN VBZ JJ TO DT NN NN NN NN VVN IN NP CC NP ( )	Fundamental	Idea	Neutral
07-1029_9	It is a well studied problem in English , and was the focus of CoNLL2000's Shared Task ( )	PP VBZ DT RB VVN NN IN NP , CC VBD DT NN IN NP NP NP ( )	BackGround	GRelated	Neutral
07-1029_10	We applied this definition to the Hebrew Tree Bank ( ) , and constructed a moderate size corpus (about 5 ,000 sentences) for Hebrew SimpleNP chunking	PP VVD DT NN TO DT NP NP NP ( ) , CC VVN DT JJ NN NN NN CD CD NN IN NP NP NN	Fundamental	Basis	Neutral
07-1029_11	SVM ( ) is a supervised binary classifier	NP ( ) VBZ DT JJ JJ NN	BackGround	SRelated	Neutral
07-1030_0	However , each of these assumes that the relations themselves are known in advance (implicitly or explicitly) so that the method can be provided with seed patterns ( ) , pattern-based rules ( ) , relation keywords ( ) , or word pairs exemplifying relation instances ( )	RB , DT IN DT VVZ IN/that DT NNS PP VBP VVN IN NN NN CC JJ RB IN/that DT NN MD VB VVN IN NN NNS ( ) , JJ NNS ( ) , NN NNS ( ) , CC NN NNS VVG NN NNS ( )	BackGround	GRelated	Neutral
07-1030_2	Most related work deals with discovery of hypernymy ( ) , synonymy ( ) and meronymy ( )	RBS VVN NN NNS IN NN IN NN ( ) , NN ( ) CC JJ ( )	BackGround	GRelated	Neutral
07-1030_3	In addition to these basic types , several studies deal with the discovery and labeling of more specific relation sub-types , including inter-verb relations ( ) and noun-compound relationships ( )	IN NN TO DT JJ NNS , JJ NNS NN IN DT NN CC VVG IN RBR JJ NN NNS , VVG NN NNS ( ) CC JJ NNS ( )	BackGround	GRelated	Neutral
07-1030_4	It should be noted that some of these papers utilize language and domain-dependent preprocessing including syntactic parsing ( ) and named entity tagging ( ) , while others take advantage of handcrafted databases such as WordNet ( ) and Wikipedia ( )	PP MD VB VVN IN/that DT IN DT NNS VV NN CC JJ NN VVG JJ VVG ( ) CC VVN NN VVG ( ) , IN NNS VVP NN IN VVN NNS JJ IN NP ( ) CC NP ( )	BackGround	GRelated	Neutral
07-1030_5	In several studies ( ) it has been shown that relatively unsupervised and language-independent methods could be used to generate many thousands of sets of words whose semantics is similar in some sense	IN JJ NNS ( ) PP VHZ VBN VVN IN/that RB JJ CC JJ NNS MD VB VVN TO VV JJ NNS IN NNS IN NNS WP$ NNS VBZ JJ IN DT NN	BackGround	GRelated	Neutral
07-1030_5	We do this as follows , essentially implementing a simplified version of the method of Davidov and Rappoport ( )	PP VVP DT RB VVZ , RB VVG DT VVN NN IN DT NN IN NP CC NP ( )	Fundamental	Basis	Neutral
07-1030_5	Note that our method differs from that of Davidov and Rappoport ( ) in that here we provide an initial seed pair , representing our target concept , while there the goal is grouping of as many words as possible into concept classes	NN IN/that PP$ NN VVZ IN DT IN NP CC NP ( ) IN IN/that RB PP VVP DT JJ NN NN , VVG PP$ NN NN , IN RB DT NN VBZ VVG IN IN JJ NNS IN JJ IN NN NNS	Compare	Compare	Neutral
07-1030_5	It was shown in ( ) that pairs of words that often appear together in such symmetric patterns tend to belong to the same class (that is , they share some notable aspect of their semantics)	PP VBD VVN IN ( ) DT NNS IN NNS WDT RB VVP RB IN JJ JJ NNS VVP TO VV TO DT JJ NN NN VBZ , PP VVP DT JJ NN IN PP$ NN	BackGround	SRelated	Neutral
07-1030_7	Studying relationships between tagged named entities , ( ) proposed unsupervised clustering methods that assign given (or semi-automatically extracted) sets of pairs into several clusters , where each cluster corresponds to one of a known relationship type	VVG NNS IN VVN VVN NNS , ( ) VVN JJ VVG NNS WDT VVP VVN NN RB JJ NNS IN NNS IN JJ NNS , WRB DT NN VVZ TO CD IN DT VVN NN NN	BackGround	GRelated	Neutral
07-1030_9	A lot of this research is based on the initial insight ( ) that certain lexical patterns ('X is a country') can be exploited to automatically generate hyponyms of a specified word	DT NN IN DT NN VBZ VVN IN DT JJ NN ( ) DT JJ JJ NNS NN VBZ DT NN MD VB VVN TO RB VV NNS IN DT JJ NN	BackGround	GRelated	Neutral
07-1030_15	In some recent work ( ) , it has been shown that related pairs can be generated without pre-specifying the nature of the relation sought	IN DT JJ NN ( ) , PP VHZ VBN VVN IN/that JJ NNS MD VB VVN IN VVG DT NN IN DT NN VVN	BackGround	GRelated	Neutral
07-1030_17	Finally , ( ) provided a pattern distance measure which allows a fully unsupervised measurement of relational similarity between two pairs of words; however , relationship types were not discovered explicitly	RB , ( ) VVD DT NN NN NN WDT VVZ DT RB JJ NN IN JJ NN IN CD NNS IN NN RB , NN NNS VBD RB VVN RB	BackGround	GRelated	Neutral
07-1031_0	The bracketing guidelines ( ) also mention the considerable difficulty of identifying the correct scope for nominal modifiers	DT NN NNS ( ) RB VV DT JJ NN IN VVG DT JJ NN IN JJ NNS	BackGround	GRelated	Neutral
07-1031_1	We use Bikel's implementation ( ) of Collins' parser ( ) in order to carry out these experiments , using the non-deficient Collins settings	PP VVP NP NN ( ) IN NP NN ( ) IN NN TO VV RP DT NNS , VVG DT JJ NP NNS	Fundamental	Basis	Neutral
07-1031_2	We draw our counts from a corpus of n-gram counts calculated over 1 trillion words from the web ( )	PP VVP PP$ NNS IN DT NN IN NN NNS VVN IN CD CD NNS IN DT NN ( )	Fundamental	Basis	Neutral
07-1031_3	We use the Briscoe and Carroll ( ) version of DepBank , a 560 sentence subset used to evaluate the rasp parser	PP VVP DT NP CC NP ( ) NN IN NP , DT CD NN NN VVN TO VV DT NN NN	Fundamental	Basis	Neutral
07-1031_4	We map the brackets to dependencies by finding the head of the np , using the Collins ( ) head finding rules , and then creating a dependency between each other child's head and this head	PP VVP DT NNS TO NNS IN VVG DT NN IN DT NN , VVG DT NP ( ) NN VVG NNS , CC RB VVG DT NN IN DT JJ NN NN CC DT NN	Fundamental	Basis	Neutral
07-1031_5	We dis-cretised the non-binary features using an implementation of Fayyad and Irani's ( ) algorithm , and classify using MegaM 2	PP VVD DT JJ NNS VVG DT NN IN NP CC NP ( ) NN , CC VV VVG NP CD	Fundamental	Basis	Neutral
07-1031_7	For instance , CCGbank ( ) was created by semi-automatically converting the Treebank phrase structure to Combinatory Categorial Grammar (ccg) ( ) derivations	IN NN , NP ( ) VBD VVN IN RB VVG DT NP NN NN TO NP NP NP NN ( ) NNS	BackGround	GRelated	Neutral
07-1031_7	An additional grammar rule is needed just to get a parse , but it is still not correct (Hockenmaier , 2003 , p	DT JJ NN NN VBZ VVN RB TO VV DT VVP , CC PP VBZ RB RB JJ NP , CD , NN	BackGround	SRelated	Negative
07-1031_8	We check the correctness of the corpus by measuring inter-annotator agreement , by reannotating the first section , and by comparing against the sub -NP structure in DepBank ( )	PP VVP DT NN IN DT NN IN VVG NN NN , IN VVG DT JJ NN , CC IN VVG IN DT NN NN NN IN NP ( )	Compare	Compare	Neutral
07-1031_8	We used the PARC700 Dependency Bank ( ) which consists of 700 Section 23 sentences annotated with labelled dependencies	PP VVD DT NP NP NP ( ) WDT VVZ IN CD NN CD NNS VVN IN VVN NNS	Fundamental	Basis	Neutral
07-1031_9	Our annotation guidelines 1 are based on those developed for annotating full sub -np structure in the biomedical domain ( )	PP$ NN NNS CD VBP VVN IN DT VVN IN VVG JJ NN NN NN IN DT JJ NN ( )	Fundamental	Basis	Neutral
07-1031_10	Lapata and Keller ( ) derive estimates from web counts , and only compare at a lexical level , achieving 78.7% accuracy	NP CC NP ( ) VV NNS IN NN NNS , CC RB VV IN DT JJ NN , VVG CD NN	BackGround	GRelated	Neutral
07-1031_11	Finally , we test the utility of the extended Treebank for training statistical models on two tasks: NP bracketing ( ) and full parsing ( )	RB , PP VVP DT NN IN DT JJ NN IN VVG JJ NNS IN CD JJ NP NN ( ) CC JJ VVG ( )	Fundamental	Basis	Neutral
07-1031_11	Lauer ( ) has demonstrated superior performance of the dependency model using a test set of 244 (216 unique) noun compounds drawn from Grolier's encyclopedia	NP ( ) VHZ VVN JJ NN IN DT NN NN VVG DT NN VVN IN CD JJ JJ NN NNS VVN IN NP NN	BackGround	GRelated	Neutral
07-1031_11	We implement a similar system to Table 4: Comparison of NP bracketing corpora Table 5: Lexical overlap Lauer ( ) , described in Section 3 , and report on results from our own data and Lauer's original set	PP VV DT JJ NN TO NP CD NP IN NP NN NNS JJ CD NN VVP NP ( ) , VVN IN NP CD , CC NN IN NNS IN PP$ JJ NNS CC NP JJ NN	Fundamental	Idea	Neutral
07-1031_12	The np bracketing task has often been posed in terms of choosing between the left or right branching structure of three word noun compounds: (a) (world (oil prices)) - Right-branching (b) ((crude oil) prices) - Left-branching Most approaches to the problem use unsupervised methods , based on competing association strength between two of the words in the compound (Marcus , 1980 , p	DT NN NN NN VHZ RB VBN VVN IN NNS IN VVG IN DT JJ CC JJ VVG NN IN CD NN NN NN NN NN NN NN : NP JJ NN NN NN : NP NP VVZ TO DT NN VVP JJ NNS , VVN IN VVG NN NN IN CD IN DT NNS IN DT JJ NP , CD , NN	BackGround	GRelated	Neutral
07-1031_13	The Penn Treebank ( ) is perhaps the most influential resource in Natural Language Processing (NLP)	DT NP NP ( ) VBZ RB DT RBS JJ NN IN NP NP NP NN	BackGround	GRelated	Positive
07-1031_13	According to Marcus et al. ( ) , asking annota-tors to markup base -np structure significantly reduced annotation speed , and for this reason base- nps were left flat	VVG TO NP NP NP ( ) , VVG NNS TO NN NN NN NN RB VVN NN NN , CC IN DT NN NN NNS VBD VVN JJ	BackGround	GRelated	Neutral
07-1031_13	For the original bracketing of the Treebank , anno-tators performed at 375-475 words per hour after a Table 1: Agreement between annotators few weeks , and increased to about 1000 words per hour after gaining more experience ( )	IN DT JJ NN IN DT NP , NNS VVD IN CD NNS IN NN IN DT JJ CD NN IN NNS JJ NNS , CC VVD TO RB CD NNS IN NN IN VVG JJR NN ( )	BackGround	SRelated	Neutral
07-1031_14	Nakov and Hearst ( ) also use web counts , but incorporate additional counts from several variations on simple bigram queries , including queries for the pairs of words concatenated or joined by a hyphen	NP CC NP ( ) RB VV NN NNS , CC VV JJ NNS IN JJ NNS IN JJ NN NNS , VVG NNS IN DT NNS IN NNS VVN CC VVN IN DT NN	BackGround	GRelated	Neutral
07-1031_14	With our new data set , we began running experiments similar to those carried out in the literature ( )	IN PP$ JJ NNS VVN , PP VVD VVG NNS JJ TO DT VVN RP IN DT NN ( )	Fundamental	Idea	Neutral
07-1031_15	Many approaches to identifying base noun phrases have been explored as part of chunking ( ) , but determining sub -np structure is rarely addressed	JJ NNS TO VVG JJ NN NNS VHP VBN VVN IN NN IN NN ( ) , CC VVG NN NN NN VBZ RB VVN	BackGround	GRelated	Neutral
07-1031_17	The bracketing tool often suggests a bracketing using rules based mostly on named entity tags , which are drawn from the bbn corpus ( )	DT NN NN RB VVZ DT NN VVG NNS VVN RB IN VVN NN NNS , WDT VBP VVN IN DT NN NN ( )	Fundamental	Basis	Neutral
07-1032_0	The most common form of parser evaluation is to apply the parseval metrics to phrase-structure parsers based on the penn Treebank , and the highest reported scores are now over 90% ( )	DT RBS JJ NN IN NN NN VBZ TO VV DT JJ NNS TO NN NNS VVN IN DT NP NP , CC DT JJS JJ NNS VBP RB IN CD ( )	BackGround	GRelated	Positive
07-1032_1	In this paper we evaluate a ccg parser ( ) on the Briscoe and Carroll version of DepBank ( )	IN DT NN PP VVP DT NN NN ( ) IN DT NP CC NP NN IN NP ( )	Fundamental	Basis	Neutral
07-1032_1	Briscoe and Carroll ( ) reannotated this resource using their grs scheme , and used it to evaluate the rasp parser	NP CC NP ( ) VVD DT NN VVG PP$ NNS NN , CC VVD PP TO VV DT NN NN	BackGround	GRelated	Neutral
07-1032_1	Parsers have been developed for a variety of grammar formalisms , for example hpsg ( ) , lfg ( ) , tag ( ) , ccg ( ) , and variants of phrase-structure grammar ( ) , including the phrase-structure grammar implicit in the Penn Treebank ( )	NNS VHP VBN VVN IN DT NN IN NN NNS , IN NN NN ( ) , NN ( ) , NN ( ) , NN ( ) , CC NNS IN NN NN ( ) , VVG DT NN NN JJ IN DT NP NP ( )	BackGround	GRelated	Neutral
07-1032_1	And third , we provide the first evaluation of a wide-coverage ccg parser outside of CCGbank , obtaining impressive results on DepBank and outperforming the rasp parser ( ) by over 5% overall and on the majority of dependency types	CC JJ , PP VVP DT JJ NN IN DT NN NN NN NN IN NP , VVG JJ NNS IN NP CC VVG DT NN NN ( ) IN IN CD JJ CC IN DT NN IN NN NNS	Compare	Compare	Negative
07-1032_1	For the gold standard we chose the version of Dep-Bank reannotated by Briscoe and Carroll ( ) , consisting of 700 sentences from Section 23 of the Penn Treebank	IN DT JJ NN PP VVD DT NN IN NP VVD IN NP CC NP ( ) , VVG IN CD NNS IN NP CD IN DT NP NP	Fundamental	Basis	Neutral
07-1032_1	The results in Table 4 were obtained by parsing the sentences from CCGbank corresponding to those in the 560-sentence test set used by Briscoe et al. ( )	DT NNS IN NP CD VBD VVN IN VVG DT NNS IN NP JJ TO DT IN DT NN NN NN VVN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1032_1	the macro-averaged scores are the mean of the individual scores for each relation ( )	DT JJ NNS VBP DT NN IN DT JJ NNS IN DT NN ( )	BackGround	SRelated	Neutral
07-1032_1	Can the ccg parser be compared with parsers other than rasp? Briscoe and Carroll ( ) give a rough comparison of rasp with the Parc lfg parser on the different versions of DepBank , obtaining similar results overall , but they acknowledge that the results are not strictly comparable because of the different annotation schemes used	MD DT NN NN VB VVN IN NNS JJ IN JJ NP CC NP ( ) VV DT JJ NN IN NN IN DT NP NN NN IN DT JJ NNS IN NP , VVG JJ NNS JJ , CC PP VVP IN/that DT NNS VBP RB RB JJ IN IN DT JJ NN NNS VVN	BackGround	GRelated	Neutral
07-1032_2	Briscoe et al. ( ) split the 700 sentences in DepBank into a test and development set , but the latter only consists of 140 sentences which was not enough to reliably create the transformation	NP NP NP ( ) VV DT CD NNS IN NP IN DT NN CC NN NN , CC DT NN RB VVZ IN CD NNS WDT VBD RB JJ TO RB VV DT NN	BackGround	GRelated	Negative
07-1032_2	All the results were obtained using the RASP evaluation scripts , with the results for the rasp parser taken from Briscoe et al. ( )	PDT DT NNS VBD VVN VVG DT NP NN NNS , IN DT NNS IN DT NN NN VVN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1032_3	Preiss ( ) compares the parsers of Collins ( ) and Charniak ( ) , the gr finder of Buchholz et al. ( ) , and the rasp parser , using the Carroll et al. ( ) gold-standard	NP ( ) VVZ DT NNS IN NP ( ) CC NP ( ) , DT NN NN IN NP NP NP ( ) , CC DT NN NN , VVG DT NP NP NP ( ) NN	BackGround	GRelated	Neutral
07-1032_5	It has been argued that the parseval metrics are too forgiving and that phrase structure is not the ideal representation for a gold standard ( )	PP VHZ VBN VVN IN/that DT JJ NNS VBP RB VVG CC DT NN NN VBZ RB DT JJ NN IN DT JJ NN ( )	BackGround	GRelated	Negative
07-1032_5	Carroll et al. ( ) describe such a suite , consisting of sentences taken from the Susanne corpus , annotated with Grammatical Relations (grs) which specify the syntactic relation between a head and dependent	NP NP NP ( ) VV PDT DT NN , VVG IN NNS VVN IN DT NP NN , VVN IN NP NP NN WDT VV DT JJ NN IN DT NN CC JJ	BackGround	GRelated	Neutral
07-1032_5	We chose not to use the corpus based on the Susanne corpus ( ) because the grs are less like the ccg dependencies; the corpus is not based on the Penn Treebank , making comparison more difficult because of tokenisation differences , for example; and the latest results for rasp are on DepBank	PP VVD RB TO VV DT NN VVN IN DT NP NN ( ) IN DT NNS VBP JJR IN DT NN NN DT NN VBZ RB VVN IN DT NP NP , VVG NN RBR JJ IN IN NN NNS , IN NN CC DT JJS NNS IN NN VBP IN NP	BackGround	SRelated	Negative
07-1032_5	parser evaluation has improved on the original parseval measures ( ) , but the challenge remains to develop a representation and evaluation suite which can be easily applied to a wide variety of parsers and formalisms	NN NN VHZ VVN IN DT JJ JJ NNS ( ) , CC DT NN VVZ TO VV DT NN CC NN NN WDT MD VB RB VVN TO DT JJ NN IN NNS CC NNS	BackGround	SRelated	Neutral
07-1032_8	Clark and Curran ( ) describes the ccg parser used for the evaluation	NP CC NP ( ) VVZ DT NN NN VVN IN DT NN	BackGround	SRelated	Neutral
07-1032_8	Previous evaluations of ccg parsers have used the predicate-argument dependencies from CCGbank as a test set ( ) , with impressive results of over 84% F-score on labelled dependencies	JJ NNS IN NN NNS VHP VVN DT NN NNS IN NP IN DT NN NN ( ) , IN JJ NNS IN RB CD NN IN VVN NNS	BackGround	SRelated	Positive
07-1032_10	Kaplan et al. ( ) compare the Collins ( ) parser with the Parc lfg parser by mapping lfg F-structures and Penn Treebank parses into DepBank dependencies , claiming that the lfg parser is considerably more accurate with only a slight reduction in speed	NP NP NP ( ) VV DT NP ( ) NN IN DT NP NN NN IN NN NN NP CC NP NP VVZ IN NP NNS , VVG IN/that DT NN NN VBZ RB RBR JJ IN RB DT JJ NN IN NN	BackGround	GRelated	Neutral
07-1032_11	The ccg parser results are based on automatically assigned pos tags , using the Curran and Clark ( ) tagger	DT NN NN NNS VBP VVN IN RB VVN NNS NNS , VVG DT NP CC NP ( ) NN	Fundamental	Basis	Neutral
07-1032_13	An example of this is from CCGbank ( ) , where all modifiers in noun-noun compound constructions modify the final noun (because the penn Treebank , from which CCGbank is derived , does not contain the necessary information to obtain the correct bracketing)	DT NN IN DT VBZ IN NP ( ) , WRB DT NNS IN NN NN NNS VV DT JJ NN NN DT NP NP , IN WDT NP VBZ VVN , VVZ RB VV DT JJ NN TO VV DT JJ NN	BackGround	GRelated	Neutral
07-1032_13	The grammar used by the parser is extracted from CCGbank , a ccg version of the Penn Treebank ( )	DT NN VVN IN DT NN VBZ VVN IN NP , DT NN NN IN DT NP NP ( )	BackGround	SRelated	Neutral
07-1032_14	Such conversions have been performed for other parsers , including parsers producing phrase structure output ( )	JJ NNS VHP VBN VVN IN JJ NNS , VVG NNS VVG NN NN NN ( )	BackGround	GRelated	Neutral
07-1032_14	Kaplan et al. ( ) clearly invested considerable time and expertise in mapping the output of the Collins parser into the DepBank dependencies , but they also note that "This conversion was relatively straightforward for LFG structures ..	NP NP NP ( ) RB VVN JJ NN CC NN IN VVG DT NN IN DT NP NN IN DT NP NNS , CC PP RB VVP IN/that NN NN VBD RB JJ IN NP NNS JJ	BackGround	GRelated	Neutral
07-1032_14	In the case of Kaplan et al. ( ) , the testing procedure would include running their conversion process on Section 23 of the Penn Treebank and evaluating the output against DepBank	IN DT NN IN NP NP NP ( ) , DT NN NN MD VV VVG PP$ NN NN IN NP CD IN DT NP NP CC VVG DT NN IN NP	BackGround	GRelated	Neutral
07-1032_15	A similar resource  the Parc Dependency Bank (DepBank) ( )  has been created using sentences from the Penn Treebank	DT JJ NN SENT DT NP NP NP NN ( ) SENT VHZ VBN VVN VVG NNS IN DT NP NP	BackGround	GRelated	Neutral
07-1032_15	The b&c scheme is similar to the original DepBank scheme ( ) , but overall contains less grammatical detail; Briscoe and Carroll ( ) describes the differences	DT NN NN VBZ JJ TO DT JJ NP NN ( ) , CC RB VVZ RBR JJ NN NP CC NP ( ) VVZ DT NNS	BackGround	Basis	Neutral
07-1032_17	Different parsers produce different output , for ex-ample phrase structure trees ( ) , dependency trees ( ) , grammatical relations ( ) , and formalism-specific dependencies ( )	JJ NNS VVP JJ NN , IN NN NN NN NNS ( ) , NN NNS ( ) , JJ NNS ( ) , CC JJ NNS ( )	BackGround	GRelated	Neutral
07-1032_20	The grammar consists of 425 lexical categories  expressing subcategorisation information  plus a small number of combinatory rules which combine the categories ( )	DT NN VVZ IN CD JJ NNS SENT VVG NN NN SENT CC DT JJ NN IN JJ NNS WDT VVP DT NNS ( )	BackGround	SRelated	Neutral
07-1033_0	A more interesting statement would be that it makes learning easier , along the lines of the result of ( )  note , however , that their results are for the "semi-supervised" domain adaptation problem and so do not apply directly	DT RBR JJ NN MD VB IN/that PP VVZ VVG JJR , IN DT NNS IN DT NN IN ( ) SENT NN , RB , IN/that PP$ NNS VBP IN DT JJ NN NN NN CC RB VVP RB VV RB	BackGround	SRelated	Neutral
07-1033_1	A part-of-speech tagging problem on PubMed abstracts introduced by Blitzer et al. ( )	DT NN VVG NN IN JJ NNS VVN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1033_2	The first model , which we shall refer to as the Prior model , was first introduced by Chelba and Acero ( )	DT JJ NN , WDT PP MD VV TO IN DT RB JJ , VBD RB VVN IN NP CC NP ( )	Fundamental	Basis	Neutral
07-1033_2	This is a recapitalization task introduced by Chelba and Acero ( ) and also used by Daume III and Marcu ( )	DT VBZ DT NN NN VVN IN NP CC NP ( ) CC RB VVN IN NP NP CC NP ( )	BackGround	GRelated	Neutral
07-1033_2	For the CNN-Recap task , we use identical feature to those used by both Chelba and Acero ( ) and Daume III and Marcu ( ): the current , previous and next word , and 1-3 letter prefixes and suffixes	IN DT NP NN , PP VVP JJ NN TO DT VVN IN DT NP CC NP ( ) CC NP NP CC NP ( NN DT JJ , JJ CC JJ NN , CC CD NN NNS CC NNS	Fundamental	Basis	Neutral
07-1033_3	Many of these are presented and evaluated by Daume III and Marcu ( )	JJ IN DT VBP VVN CC VVN IN NP NP CC NP ( )	BackGround	GRelated	Neutral
07-1033_3	Daume III and Marcu ( ) provide empirical evidence on four datasets that the Prior model outperforms the baseline approaches	NP NP CC NP ( ) VV JJ NN IN CD NNS IN/that DT RB NN VVZ DT NN NNS	BackGround	GRelated	Neutral
07-1033_3	More recently , Daume III and Marcu ( ) presented an algorithm for domain adaptation for maximum entropy classifiers	RBR RB , NP NP CC NP ( ) VVD DT NN IN NN NN IN JJ NN NNS	BackGround	GRelated	Neutral
07-1033_3	We additionally ran the MegaM model ( ) on these data (though not in the multi-conditional case; for this , we considered the single source as the union of all sources)	PP RB VVD DT NP NN ( ) IN DT NNS NN RB IN DT JJ NN IN DT , PP VVD DT JJ NN IN DT NN IN DT NN	Fundamental	Basis	Neutral
07-1033_5	In all cases , we use the S earn algorithm for solving the sequence labeling problem ( ) with an underlying averaged perceptron classifier; implementation due to ( )	IN DT NNS , PP VVP DT NP VVP NN IN VVG DT NN VVG NN ( ) IN DT VVG VVN NN NN NN JJ TO ( )	Fundamental	Basis	Neutral
07-1033_6	Second , it is arguable that a measure like F 1 is inappropriate for chunking tasks ( )	RB , PP VBZ JJ IN/that DT NN IN NP CD VBZ JJ IN JJ NNS ( )	BackGround	SRelated	Neutral
07-1034_0	Following ( ) , we call the first the source domain , and the second the target domain	VVG ( ) , PP VVP DT JJ DT NN NN , CC DT NN DT NN NN	Fundamental	Idea	Neutral
07-1034_0	Recently there have been some studies addressing domain adaptation from different perspectives ( )	RB EX VHP VBN DT NNS VVG NN NN IN JJ NNS ( )	BackGround	GRelated	Neutral
07-1034_0	The POS data set and the CTS data set have previously been used for testing other adaptation methods ( ) , though the setup there is different from ours	DT NP NNS VVD CC DT NP NNS VVN VHP RB VBN VVN IN VVG JJ NN NNS ( ) , IN DT NN EX VBZ JJ IN PP	BackGround	GRelated	Neutral
07-1034_0	Blitzer et al. ( ) propose a domain adaptation method that uses the unlabeled target instances to infer a good feature representation , which can be regarded as weighting the features	NP NP NP ( ) VV DT NN NN NN WDT VVZ DT JJ NN NNS TO VV DT JJ NN NN , WDT MD VB VVN IN NN DT NNS	BackGround	GRelated	Neutral
07-1034_1	Chelba and Acero ( ) use the parameters of the maximum entropy model learned from the source domain as the means of a Gaussian prior when training a new model on the target data	NP CC NP ( ) VV DT NNS IN DT JJ NN NN VVN IN DT NN NN IN DT NN IN DT JJ RB WRB VVG DT JJ NN IN DT NN NNS	BackGround	GRelated	Neutral
07-1034_2	The setup is very similar to Daume III and Marcu ( )	DT NN VBZ RB JJ TO NP NP CC NP ( )	Fundamental	Idea	Neutral
07-1034_2	els the different distributions in the source and the target domains is by Daume III and Marcu ( )	NNS DT JJ NNS IN DT NN CC DT NN NNS VBZ IN NP NP CC NP ( )	Fundamental	Basis	Neutral
07-1034_3	Florian et al. ( ) first train a NE tagger on the source domain , and then use the tagger's predictions as features for training and testing on the target domain	NP NP NP ( ) RB VV DT RB JJ IN DT NN NN , CC RB VV DT JJ NNS IN NNS IN NN CC NN IN DT NN NN	BackGround	GRelated	Neutral
07-1034_4	This way of setting 7 corresponds to the entropy minimization semi-supervised learning method ( )	DT NN IN VVG CD VVZ TO DT NN NN VVD VVG NN ( )	BackGround	SRelated	Neutral
07-1034_5	For generative syntactic parsing , Roark and Bac-chiani ( ) have used the source domain data to construct a Dirichlet prior for MAP estimation of the PCFG for the target domain	IN JJ JJ VVG , NP CC NP ( ) VHP VVN DT NN NN NNS TO VV DT NN RB IN NP NN IN DT NN IN DT NN NN	BackGround	GRelated	Neutral
07-1035_0	ber of hidden components is not fixed , but emerges We begin by presenting three finite tree models , each naturally from the training data ( )	NNS IN JJ NNS VBZ RB VVN , CC VVZ PP VV IN VVG CD JJ NN NNS , DT RB IN DT NN NNS ( )	Fundamental	Basis	Neutral
07-1035_1	The closely related infinite hidden Markov model is an HMM in which the transitions are modeled using an HDP , enabling unsupervised learning of sequence models when the number of hidden states is unknown ( )	DT RB VVN JJ JJ NP NN VBZ DT NP IN WDT DT NNS VBP VVN VVG DT NP , VVG JJ NN IN NN NNS WRB DT NN IN JJ NNS VBZ JJ ( )	BackGround	GRelated	Neutral
07-1035_1	The infinite hidden Markov model (iHMM) or HDP-HMM ( ) is a model of sequence data with transitions modeled by an HDP	DT NN VVN NP NN NN CC NP ( ) VBZ DT NN IN NN NNS IN NNS VVN IN DT NP	BackGround	GRelated	Neutral
07-1035_2	This is useful , because coarse-grained syntactic categories , such as those used in the Penn Treebank (PTB) , make insufficient distinctions to be the basis of accurate syntactic parsing ( )	DT VBZ JJ , IN JJ JJ NNS , JJ IN DT VVN IN DT NP NP NN , VV JJ NNS TO VB DT NN IN JJ JJ VVG ( )	BackGround	GRelated	Negative
07-1035_4	Hence , state-of-the-art parsers either supplement the part-of-speech (POS) tags with the lexical forms themselves ( ) , manually split the tagset into a finer-grained one ( ) , or learn finer grained tag distinctions using a heuristic learning procedure ( )	RB , JJ NNS CC VV DT NN NN NNS IN DT JJ NNS PP ( ) , RB VVD DT NN IN DT JJ CD ( ) , CC VV JJR VVN NN NNS VVG DT JJ NN NN ( )	BackGround	GRelated	Positive
07-1035_5	But the introduction of nonparametric priors such as the Dirichletprocess ( ) enabled development of infinite mixture models , in which the num-Teh et al. ( ) proposed the hierarchical Dirichlet process (HDP) as a way of applying the Dirichlet process (DP) to more complex model forms , so as to allow multiple , group-specific , infinite mixture models to share their mixture components	CC DT NN IN JJ NNS JJ IN DT NP ( ) VVN NN IN JJ NN NNS , IN WDT DT NP NP NP ( ) VVN DT JJ NP NN NN IN DT NN IN VVG DT NP NN NN TO JJR JJ NN NNS , RB RB TO VV JJ , JJ , JJ NN NNS TO VV PP$ NN NNS	BackGround	GRelated	Neutral
07-1035_6	8 Additionally , we compute the mutual information of the learned clusters with the gold tags , and we compute the cluster F-score ( )	CD RB , PP VV DT JJ NN IN DT VVN NNS IN DT JJ NNS , CC PP VV DT NN NN ( )	Fundamental	Basis	Neutral
07-1035_7	First , we use the standard approach of greedily assigning each of the learned classes to the POS tag with which it has the greatest overlap , and then computing tagging accuracy ( )	RB , PP VVP DT JJ NN IN RB VVG DT IN DT VVN NNS TO DT NP NN IN WDT PP VHZ DT JJS VVP , CC RB VVG VVG NN ( )	Fundamental	Basis	Neutral
07-1035_7	For comparison , Haghighi and Klein ( ) report an unsupervised baseline of 41.3% , and a best result of 80.5% from using hand-labeled prototypes and distributional similarity	IN NN , NP CC NP ( ) VV DT JJ NN IN CD , CC DT JJS NN IN CD IN VVG JJ NNS CC JJ NN	Compare	Compare	Neutral
07-1035_8	Earlier , Johnson et al. ( ) presented adaptor grammars , which is a very similar model to the HDP-PCFG	RBR , NP NP NP ( ) VVN NN NNS , WDT VBZ DT RB JJ NN TO DT NP	BackGround	GRelated	Neutral
07-1035_9	We use the generative dependency parser distributed with the Stanford factored parser ( ) for the comparison , since it performs simultaneous tagging and parsing during testing	PP VVP DT JJ NN NN VVN IN DT NP VVD NN ( ) IN DT NN , IN PP VVZ JJ VVG CC VVG IN NN	Fundamental	Basis	Neutral
07-1035_11	The HDP-PCFG ( ) , developed at the same time as this work , aims to learn state splits for a binary-branching PCFG	DT NP ( ) , VVN IN DT JJ NN IN DT NN , VVZ TO VV NN NNS IN DT NN NN	BackGround	GRelated	Neutral
07-1035_11	In contrast , Liang et al. ( ) define a global DP over sequences , with the base measure defined over the global state probabilities ,  0; locally , each state has an HDP , with this global DP as the base measure	IN NN , NP NP NP ( ) VV DT JJ NN IN NNS , IN DT NN NN VVN IN DT JJ NN NNS , JJ RB , DT NN VHZ DT NP , IN DT JJ NN IN DT NN NN	Compare	Compare	Neutral
07-1035_12	For both experiments , we used dependency trees extracted from the Penn Treebank ( ) using the head rules and dependency extractor from Yamada and Matsumoto ( )	IN DT NNS , PP VVD NN NNS VVN IN DT NP NP ( ) VVG DT NN NNS CC NN NN IN NP CC NP ( )	Fundamental	Basis	Neutral
07-1035_14	To generate  n we first generate an infinite sequence of variables  n' = (n k each of which is distributed according to the Beta distribution: Then  n = (n k) = = 1 is defined as: (1 Following Pitman ( ) we refer to this process as n  GEM (a 0)	TO VV NN PP RB VV DT JJ NN IN NNS NP SYM NN NN DT IN WDT VBZ VVN VVG TO DT JJ NN RB NN SYM JJ NN SYM SYM CD VBZ VVN JJ NN VVG NP ( ) PP VVP TO DT NN IN NN SENT NN NN NN	Fundamental	Idea	Neutral
07-1035_16	Teh , 2006 , p.c.) , to sample each m jk: sampleM (j , k) 1 if n jk = 0 2 then m jk = 0 3 else m jk = 1 4 for i ^ 2 to n jk 5 doifrand ()<^^T 6 then m jk = m jk + 1 7 return m jk Sampling /? 	NP , CD , NN , TO NN DT NN NN NN NN , NN JJ IN NN NN SYM CD JJ JJ NN NN SYM CD JJ JJ NN NN SYM CD JJ IN NP SYM CD TO NN NN JJ NN NN JJ RB NN NN SYM NN NN SYM CD JJ NN NN NN NP NN	NULL	NULL	NULL
07-1036_0	In many cases , improving semi-supervised models was done by seeding these models with domain information taken from dictionaries or ontology ( )	IN JJ NNS , VVG JJ NNS VBD VVN IN VVG DT NNS IN NN NN VVN IN NNS CC NN ( )	BackGround	GRelated	Neutral
07-1036_0	This follows a conceptually similar approach by ( ) that uses a large named-entity dictionary , where the similarity between the candidate named-entity and its matching prototype in the dictionary is encoded as a feature in a supervised classifier	DT VVZ DT RB JJ NN IN ( ) WDT VVZ DT JJ NN NN , WRB DT NN IN DT NN NN CC PP$ VVG NN IN DT NN VBZ VVN IN DT NN IN DT JJ NN	BackGround	GRelated	Neutral
07-1036_1	Therefore , an increasing attention has been recently given to semi-supervised learning , where large amounts of unlabeled data are used to improve the models learned from a small training set ( )	RB , DT VVG NN VHZ VBN RB VVN TO JJ NN , WRB JJ NNS IN JJ NNS VBP VVN TO VV DT NNS VVN IN DT JJ NN NN ( )	BackGround	GRelated	Neutral
07-1036_1	This was used , for example , by ( ) in information extraction , and by ( ) in POS tagging	DT VBD VVN , IN NN , IN ( ) IN NN NN , CC IN ( ) IN NP VVG	BackGround	GRelated	Neutral
07-1036_2	This decomposition applies both to discriminative linear models and to generative models such as HMMs and CRFs , in which case the linear sum corresponds to log likelihood assigned to the input/output pair by the model (for details see ( ) for the classification case and ( ) for the structured case)	DT NN VVZ CC TO JJ JJ NNS CC TO JJ NNS JJ IN NP CC NP , IN WDT NN DT JJ NN VVZ TO VV NN VVN TO DT NN NN IN DT NN NN NNS VVP ( ) IN DT NN NN CC ( ) IN DT JJ NN	BackGround	SRelated	Neutral
07-1036_3	For example , ( ) proposes Diagonal Transition Models for sequential labeling tasks where neighboring words tend to have the same labels	IN NN , ( ) VVZ NP NP NP IN JJ VVG NNS WRB JJ NNS VVP TO VH DT JJ NNS	BackGround	GRelated	Neutral
07-1036_3	The second problem we consider is extracting fields from advertisements ( )	DT JJ NN PP VVP VBZ VVG NNS IN NNS ( )	Fundamental	Basis	Neutral
07-1036_3	( ) and ( ) also report results for semi-supervised learning for these domains	( ) CC ( ) RB VV NNS IN JJ NN IN DT NNS	BackGround	GRelated	Neutral
07-1036_4	( ) extends the dictionary-based approach to sequential labeling tasks by propagating the information given in the seeds with contextual word similarity	( ) VVZ DT JJ NN TO JJ VVG NNS IN VVG DT NN VVN IN DT NNS IN JJ NN NN	BackGround	GRelated	Neutral
07-1036_4	We implement some global constraints and include unary constraints which were largely imported from the list of seed words used in ( )	PP VV DT JJ NNS CC VVP JJ NNS WDT VBD RB VVN IN DT NN IN NN NNS VVN IN ( )	Fundamental	Basis	Neutral
07-1036_4	( ) also worked on one of our data sets	( ) RB VVN IN CD IN PP$ NN NNS	BackGround	SRelated	Neutral
07-1036_5	1 The first task is to identify fields from citations ( ) 	LS DT JJ NN VBZ TO VV NNS IN NNS ( )	Fundamental	Basis	Neutral
07-1036_6	Another way to look the algorithm is from the self-training perspective ( )	DT NN TO VV DT NN VBZ IN DT JJ NN ( )	BackGround	SRelated	Neutral
07-1036_7	However , in the general case , semi-supervised approaches give mixed results , and sometimes even degrade the model performance ( )	RB , IN DT JJ NN , JJ NNS VVP JJ NNS , CC RB RB VV DT NN NN ( )	BackGround	GRelated	Negative
07-1036_7	( ) has suggested to balance the contribution of labeled and unlabeled data to the parameters	( ) VHZ VVN TO VV DT NN IN VVN CC JJ NNS TO DT NNS	BackGround	SRelated	Neutral
07-1036_8	( )	( )	NULL	NULL	NULL
07-1036_8	This confirms results reported for the supervised learning case in ( )	DT VVZ NNS VVN IN DT JJ VVG NN IN ( )	BackGround	SRelated	Neutral
07-1036_9	On the other hand , in the supervised setting , it has been shown that incorporating domain and problem specific structured information can result in substantial improvements ( )	IN DT JJ NN , IN DT JJ NN , PP VHZ VBN VVN IN/that VVG NN CC NN JJ JJ NN MD VV IN JJ NNS ( )	BackGround	GRelated	Neutral
07-1036_9	However ( ) showed that reasoning with more expressive , non-sequential constraints can improve the performance for the supervised protocol	RB ( ) VVD DT NN IN RBR JJ , JJ NNS MD VV DT NN IN DT JJ NN	BackGround	GRelated	Neutral
07-1036_9	We note that in the presence of constraints , the inference procedure (for finding the output y that maximizes the cost function) is usually done with search techniques (rather than Viterbi decoding , see ( ) for a discussion) , we chose beamsearch decoding	PP VVP IN/that IN DT NN IN NNS , DT NN NN NN VVG DT NN NN WDT VVZ DT NN NN VBZ RB VVN IN NN NNS NN IN NP VVG , VV ( ) IN DT NN , PP VVD NN VVG	BackGround	SRelated	Neutral
07-1036_9	While ( ) showed the significance of using hard constraints , our experiments show that using soft constraints is a superior option	IN ( ) VVD DT NN IN VVG JJ NNS , PP$ NNS VVP IN/that VVG JJ NNS VBZ DT JJ NN	BackGround	GRelated	Negative
07-1036_11	Conceptually , although not technically , the most related work to ours is ( ) that , in a somewhat ad-hoc manner uses soft constraints to guide an unsupervised model that was crafted for mention tracking	RB , IN RB RB , DT RBS JJ NN TO PP VBZ ( ) IN/that , IN DT RB NN NN VVZ JJ NNS TO VV DT JJ NN WDT VBD VVN IN NN NN	BackGround	SRelated	Neutral
07-1037_0	Crucially , the kind of lexical descriptions that we employ are those that are commonly devised within lexicon-driven approaches to linguistic syntax , e.g.Lexicalized Tree-Adjoining Grammar ( ) and Combinary Categorial Grammar ( )	RB , DT NN IN JJ NNS IN/that PP VVP VBP DT WDT VBP RB VVN IN JJ NNS TO JJ NN , VVD NP NP ( ) CC NP NP NP ( )	Fundamental	Basis	Neutral
07-1037_0	There are currently two supertagging approaches available: LTAG-based ( ) and CCG-based ( )	EX VBP RB CD NN VVZ JJ JJ ( ) CC JJ ( )	BackGround	GRelated	Neutral
07-1037_0	One important way of portraying such lexical descriptions is via the supertags devised in the LTAG and CCG frameworks ( )	CD JJ NN IN VVG JJ JJ NNS VBZ IN DT NNS VVN IN DT NP CC NP NNS ( )	BackGround	SRelated	Positive
07-1037_0	The term "supertagging" ( ) refers to tagging the words of a sentence , each with a supertag	DT NN NN ( ) VVZ TO VVG DT NNS IN DT NN , DT IN DT NN	BackGround	SRelated	Neutral
07-1037_0	The LTAG-based supertagger of ( ) is a standard HMM tagger and consists of a (second-order) Markov language model over supertags and a lexical model conditioning the probability of every word on its own supertag (just like standard HMM-based POS taggers)	DT JJ NN IN ( ) VBZ DT JJ NP NN CC VVZ IN DT JJ NP NN NN IN NNS CC DT JJ NN NN DT NN IN DT NN IN PP$ JJ NN NN IN JJ JJ NP NN	BackGround	SRelated	Neutral
07-1037_0	For the LTAG supertags experiments , we used the LTAG English supertagger 5 ( ) to tag the English part of the parallel data and the supertag language model data	IN DT NP NNS NNS , PP VVD DT NP NP NN CD ( ) TO VV DT JJ NN IN DT JJ NNS CC DT NN NN NN NNS	Fundamental	Basis	Neutral
07-1037_0	Akin to POS tagging , the process of supertagging an input utterance proceeds with statistics that are based on the probability of a word-supertag pair given their Markovian or local context ( )	JJ TO NP VVG , DT NN IN VVG DT NN NN NNS IN NNS WDT VBP VVN IN DT NN IN DT NN NN VVN PP$ NP CC JJ NN ( )	BackGround	GRelated	Neutral
07-1037_1	Besides the difference in probabilities and statistical estimates , these two supertaggers differ in the way the supertags are extracted from the Penn Treebank , cf. ( )	IN DT NN IN NNS CC JJ NNS , DT CD NNS VVP IN DT NN DT NNS VBP VVN IN DT NP NP , JJ ( )	BackGround	GRelated	Neutral
07-1037_2	Only quite recently have ( ) and ( ) shown that incorporating some form of syntactic structure could show improvements over a baseline PBSMT system	RB RB RB VH ( ) CC ( ) VVN IN/that VVG DT NN IN JJ NN MD VV NNS IN DT JJ NP NN	BackGround	GRelated	Positive
07-1037_2	Among the first to demonstrate improvement when adding recursive structure was ( ) , who allows for hierarchical phrase probabilities that handle a range of reordering phenomena in the correct fashion	IN DT JJ TO VV NN WRB VVG JJ NN VBD ( ) , WP VVZ IN JJ NN NNS WDT VVP DT NN IN VVG NNS IN DT JJ NN	BackGround	GRelated	Neutral
07-1037_3	The CCG supertagger ( ) is based on log-linear probabilities that condition a supertag on features representing its context	DT NP NN ( ) VBZ VVN IN JJ NNS WDT NN DT NN IN NNS VVG PP$ NN	BackGround	SRelated	Neutral
07-1037_3	For the CCG supertag experiments , we used the CCG supertagger of ( ) and the Edinburgh CCG tools 6 to tag the English part of the parallel corpus as well as the CCG supertag language model data	IN DT NP NN NNS , PP VVD DT NP NN IN ( ) CC DT NP NP NNS CD TO VV DT JJ NN IN DT JJ NN RB RB IN DT NP NN NN NN NNS	Fundamental	Basis	Neutral
07-1037_4	Both the LTAG ( ) and the CCG supertag sets ( ) were acquired from the WSJ section of the Penn-II Treebank using hand-built extraction rules	CC DT NP ( ) CC DT NP NN NNS ( ) VBD VVN IN DT NP NN IN DT NP NP VVG JJ NN NNS	Fundamental	Basis	Neutral
07-1037_6	Decoder The decoder used in this work is Moses , a log-linear decoder similar to Pharaoh ( ) , modified to accommodate supertag phrase probabilities and supertag language models	NN DT NN VVN IN DT NN VBZ NP , DT JJ NN JJ TO NN ( ) , VVN TO VV NN NN NNS CC NN NN NNS	Fundamental	Idea	Neutral
07-1037_7	Within the field of Machine Translation , by far the most dominant paradigm is Phrase-based Statistical Machine Translation (PBSMT) ( )	IN DT NN IN NN NN , IN RB DT RBS JJ NN VBZ JJ NP NN NN NN ( )	BackGround	GRelated	Neutral
07-1037_7	For example , ( ) demonstrated that adding syntax actually harmed the quality of their SMT system	IN NN , ( ) VVD IN/that VVG NN RB VVD DT NN IN PP$ NP NN	BackGround	GRelated	Neutral
07-1037_7	The bidirectional word alignment is used to obtain phrase translation pairs using heuristics presented in ( ) and ( ) , and the Moses decoder was used for phrase extraction and decoding	DT JJ NN NN VBZ VVN TO VV NN NN NNS VVG NNS VVN IN ( ) CC ( ) , CC DT NP NN VBD VVN IN NN NN CC VVG	Fundamental	Basis	Neutral
07-1037_7	The bidirectional word alignment is used to obtain lexical phrase translation pairs using heuristics presented in ( ) and ( )	DT JJ NN NN VBZ VVN TO VV JJ NN NN NNS VVG NNS VVN IN ( ) CC ( )	NULL	NULL	NULL
07-1037_8	Coming right up to date , ( ) demonstrate that 'syntactified' target language phrases can improve translation quality for Chinese-English	VVG RB RB TO NN , ( ) VV DT NP NN NN NNS MD VV NN NN IN NP	BackGround	GRelated	Positive
07-1037_8	While the research of ( ) has much in common with the approach proposed here (such as the syntactified target phrases) , there remain a number of significant differences	IN DT NN IN ( ) VHZ RB RB JJ IN DT NN VVD RB NN IN DT JJ NN NN , RB VV DT NN IN JJ NNS	BackGround	SRelated	Neutral
07-1037_10	The NIST MT03 test set is used for development , particularly for optimizing the interpolation weights using Minimum Error Rate training ( )	DT NP NP NN NN VBZ VVN IN NN , RB IN VVG DT NN NNS VVG NP NP NP NN ( )	Fundamental	Basis	Neutral
07-1037_11	Firstly , rather than induce millions of xRS rules from parallel data , we extract phrase pairs in the standard way ( ) and associate with each phrase-pair a set of target language syntactic structures based on supertag sequences	RB , RB IN VV NNS IN NNS NNS IN JJ NNS , PP VV NN NNS IN DT JJ NN ( ) CC NN IN DT NN DT NN IN NN NN JJ NNS VVN IN NN NNS	Fundamental	Basis	Neutral
07-1037_12	Table 1 presents the BLEU scores ( ) of both systems on the NIST 2005 MT Evaluation test set	NN CD VVZ DT NP NNS ( ) IN DT NNS IN DT NP CD NP NP NN NN	Fundamental	Basis	Neutral
07-1038_0	For less commonly used languages , one might use open source research systems ( )	IN RBR RB VVN NNS , PP MD VV JJ NN NN NNS ( )	BackGround	MRelated	Neutral
07-1038_1	Also relevant is previous work that applied machine learning approaches to MT evaluation , both with human references ( ) and without ( )	RB JJ VBZ JJ NN WDT VVD NN VVG NNS TO NP NN , CC IN JJ NNS ( ) CC IN ( )	BackGround	GRelated	Neutral
07-1038_2	METEOR uses the Porter stemmer and synonym-matching via WordNet to calculate recall and precision more accurately ( )	NN VVZ DT NP NN CC NN IN NP TO VV NN CC NN JJR RB ( )	BackGround	GRelated	Neutral
07-1038_3	As its loss function , support vector regression uses an e-insensitive error function , which allows for errors within a margin of a small positive value , e , to be considered as having zero error (cf.Bishop ( ) , pp.339-344)	IN PP$ NN NN , NN NN NN VVZ DT NP NN NN , WDT VVZ IN NNS IN DT NN IN DT JJ JJ NN , NN , TO VB VVN IN VHG CD NN NN ( ) , NN	BackGround	SRelated	Neutral
07-1038_4	This can be seen as a form of confidence estimation on MT outputs ( )	DT MD VB VVN IN DT NN IN NN NN IN NP NNS ( )	BackGround	GRelated	Neutral
07-1038_4	To remove the bias in the distributions of scores between different judges , we follow the normalization procedure described by Blatz et al. ( )	TO VV DT NN IN DT NNS IN NNS IN JJ NNS , PP VVP DT NN NN VVN IN NP NP NP ( )	Fundamental	Idea	Neutral
07-1038_8	We conducted experiments to determine the feasibility of the proposed approach and to address the following questions: (1) How informative are pseudo references in-and-of themselves? Does varying the number and/or the quality of the references have an impact on the metrics? (2) What are the contributions of the adequacy features versus the fluency features to the learning-based metric? (3) How do the quality and distribution of the training examples , together with the quality of the pseudo references , impact the metric training? (4) Do these factors impact the metric's ability in assessing sentences produced within a single MT system? How does that system's quality affect metric performance? The implementation of support vector regression used for these experiments is SVM-Light ( )	PP VVD NNS TO VV DT NN IN DT VVN NN CC TO VV DT VVG NN NN WRB JJ VBP JJ NNS JJ NN VVZ VVG DT NN NN DT NN IN DT NNS VHP DT NN IN DT JJ NN WP VBP DT NNS IN DT NN VVZ IN DT NN VVZ TO DT JJ NN NN WRB VVP DT NN CC NN IN DT NN NNS , RB IN DT NN IN DT JJ NNS , VV DT JJ NN NN VVP DT NNS NN DT JJ NN IN VVG NNS VVN IN DT JJ NP NN WRB VVZ DT NNS NN VVP JJ NN DT NN IN NN NN NN VVN IN DT NNS VBZ NP ( )	Fundamental	Basis	Neutral
07-1038_9	To compare the relative quality of different metrics , we apply bootstrapping re-sampling on the data , and then use paired t-test to determine the statistical significance of the correlation differences ( )	TO VV DT JJ NN IN JJ NNS , PP VVP VVG NN IN DT NNS , CC RB VV VVN NN TO VV DT JJ NN IN DT NN NNS ( )	Fundamental	Basis	Neutral
07-1038_11	ROUGE utilizes 'skip n-grams' , which allow for matches of sequences of words that are not necessarily adjacent ( )	NN VVZ NN NP , WDT VVP IN NNS IN NNS IN NNS WDT VBP RB RB JJ ( )	BackGround	GRelated	Neutral
07-1038_11	BLEU is smoothed ( ) , and it considers only matching up to bigrams because this has higher correlations with human judgments than when higher-ordered n-grams are included	NP VBZ VVN ( ) , CC PP VVZ RB VVG RP TO NNS IN DT VHZ JJR NNS IN JJ NNS IN WRB JJ NNS VBP VVN	BackGround	SRelated	Neutral
07-1038_13	The HWC metrics compare dependency and constituency trees for both reference and machine translations ( )	DT NP NNS VVP NN CC NN NNS IN DT NN CC NN NNS ( )	BackGround	GRelated	Neutral
07-1038_13	In addition to adapting the idea of Head Word Chains ( ) , we also compared the input sentence's argument structures against the treebank for certain syntactic categories	IN NN TO VVG DT NN IN NP NP NP ( ) , PP RB VVD DT NN NN NN NNS IN DT NN IN JJ JJ NNS	Fundamental	Idea	Neutral
07-1038_15	Reference-based metrics such as BLEU ( ) have rephrased this subjective task as a somewhat more objective question: how closely does the translation resemble sentences that are known to be good translations for the same source? This approach requires the participation of human translators , who provide the "gold standard" reference sentences	JJ NNS JJ IN NP ( ) VHP VVN DT JJ NN IN DT RB RBR JJ NN WRB RB VVZ DT NN VVP NNS WDT VBP VVN TO VB JJ NNS IN DT JJ NN DT NN VVZ DT NN IN JJ NNS , WP VVP DT JJ JJ NN NNS	BackGround	GRelated	Neutral
07-1039_0	The relationship between word alignments and their impact on MT is also investigated in ( )	DT NN IN NN NNS CC PP$ NN IN NP VBZ RB VVN IN ( )	BackGround	GRelated	Neutral
07-1039_2	Most current statistical models ( ) treat the aligned sentences in the corpus as sequences of tokens that are meant to be words; the goal of the alignment process is to find links between source and target words	RBS JJ JJ NNS ( ) VV DT VVN NNS IN DT NN IN NNS IN NNS WDT VBP VVN TO VB JJ DT NN IN DT NN NN VBZ TO VV NNS IN NN CC NN NNS	BackGround	GRelated	Neutral
07-1039_2	To quickly (and approximately) evaluate this phenomenon , we trained the statistical IBM word-alignment model 4 ( ) , 1 using the GIZA++ software ( ) for the first two language pairs , and the Europarl corpus ( ) for the last one	TO RB VV NN VV DT NN , PP VVN DT JJ NP NN NN CD ( ) , CD VVG DT NP NN ( ) IN DT JJ CD NN NNS , CC DT NN NN ( ) IN DT JJ CD	Fundamental	Basis	Neutral
07-1039_2	They can be seen as extensions of the simpler IBM models 1 and 2 ( )	PP MD VB VVN IN NNS IN DT JJR NP NNS CD CC CD ( )	BackGround	GRelated	Neutral
07-1039_2	We use a standard log-linear phrase-based statistical machine translation system as a baseline: GIZA++ implementation of IBM word alignment model 4 ( ) , 8 the refinement and phrase-extraction heuristics described in ( ) , minimum-error-rate training Table 2: Chinese-English corpus statistics ( ) using Phramer ( ) , a 3-gram language model with Kneser-Ney smoothing trained with SRILM ( ) on the English side of the training data and Pharaoh ( ) with default settings to decode	PP VVP DT JJ JJ JJ JJ NN NN NN IN DT JJ NP NN IN NP NN NN NN CD ( ) , CD DT NN CC NN NNS VVN IN ( ) , NN NN NP CD NP NN NNS ( ) VVG NP ( ) , DT JJ NN NN IN NP VVG VVN IN NP ( ) IN DT JJ NN IN DT NN NNS CC NN ( ) IN NN NNS TO VV	Fundamental	Basis	Neutral
07-1039_3	We also want to bootstrap on different word aligners; in particular , one possibility is to use the flexible HMM word-to-phrase model of Deng and Byrne ( ) in place of IBM model 4	PP RB VVP TO NN IN JJ NN NN IN JJ , CD NN VBZ TO VV DT JJ NP NN NN IN NP CC NP ( ) IN NN IN NP NN CD	BackGround	MRelated	Neutral
07-1039_5	We evaluate the reliability of these candidates , using simple metrics based on co-occurence frequencies , similar to those used in associative approaches to word alignment ( )	PP VVP DT NN IN DT NNS , VVG JJ NNS VVN IN NN NNS , JJ TO DT VVN IN JJ NNS TO NN NN ( )	Fundamental	Idea	Neutral
07-1039_9	Second , an increase in AER does not necessarily imply an improvement in translation quality ( ) and vice-versa ( )	RB , DT NN IN NP VVZ RB RB VV DT NN IN NN NN ( ) CC NP ( )	BackGround	SRelated	Neutral
07-1039_12	This very simple measure is frequently used in associative approaches ( )	DT RB JJ NN VBZ RB VVN IN JJ NNS ( )	BackGround	SRelated	Neutral
07-1039_14	%: there is want to need not I^iS: in front of : as soon as ;#: look at Figure 2: Examples of entries from the manually developed dictionary The intrinsic quality of word alignment can be assessed using the Alignment Error Rate (AER) metric ( ) , that compares a system's alignment output to a set of gold-standard alignment	NN RB VBZ VV TO VV RB JJ IN NN IN NN RB RB IN JJ NN IN NN CD NNS IN NNS IN DT RB VVN NN DT JJ NN IN NN NN MD VB VVN VVG DT NP NP NP JJ JJ ( ) , WDT VVZ DT JJ NN NN TO DT NN IN NN NN	BackGround	SRelated	Neutral
07-1039_17	The quality of the translation output is evaluated using BLEU ( )	DT NN IN DT NN NN VBZ VVN VVG NP ( )	Fundamental	Basis	Neutral
07-1039_18	The experiments were carried out using the Chinese-English datasets provided within the IWSLT 2006 evaluation campaign ( ) , extracted from the Basic Travel Expression Corpus (BTEC) ( )	DT NNS VBD VVN IN VVG DT NP NNS VVN IN DT NP CD NN NN ( ) , VVN IN DT JJ NP NN NP NP ( )	Fundamental	Basis	Neutral
07-1039_18	For Chinese , the data provided were tokenized according to the output format of ASR systems , and human-corrected ( )	IN NP , DT NNS VVN VBD VVN VVG TO DT NN NN IN NP NNS , CC JJ ( )	Fundamental	Basis	Neutral
07-1039_22	Note that the need to consider segmentation and alignment at the same time is also mentioned in ( ) , and related issues are reported in ( )	NN IN/that DT NN TO VV NN CC NN IN DT JJ NN VBZ RB VVN IN ( ) , CC JJ NNS VBP VVN IN ( )	BackGround	GRelated	Neutral
07-1039_25	More importantly , however , this segmentation is often performed in a monolingual context , which makes the word alignment task more difficult since different languages may realize the same concept using varying numbers of words (see e.g. ( ))	RBR RB , RB , DT NN VBZ RB VVN IN DT JJ NN , WDT VVZ DT NN NN NN RBR JJ IN JJ NNS MD VV DT JJ NN VVG VVG NNS IN NNS NN FW ( JJ	BackGround	GRelated	Neutral
07-1039_27	The log-linear model is also based on standard features: conditional probabilities and lexical smoothing ofphrases in both directions , and phrase penalty ( )	DT JJ NN VBZ RB VVN IN JJ JJ JJ NNS CC JJ VVG NNS IN DT NNS , CC NN NN ( )	Fundamental	Basis	Neutral
07-1039_28	To test the influence of the initial word segmentation on the process of word packing , we considered an additional segmentation configuration , based on an automatic segmenter combining rule-based and statistical techniques ( )	TO VV DT NN IN DT JJ NN NN IN DT NN IN NN NN , PP VVD DT JJ NN NN , VVN IN DT JJ NN VVG JJ CC JJ NNS ( )	Fundamental	Basis	Neutral
07-1039_28	These resources follow more or less the same format as the output of the word segmenter mentioned in Section 5.1.2 ( ) , so the experiments are carried out using this segmentation	DT NNS VVP RBR CC RBR DT JJ NN IN DT NN IN DT NN NN VVN IN NP CD ( ) , RB DT NNS VBP VVN IN VVG DT NN	Fundamental	Idea	Neutral
07-1040_0	It has been argued that METEOR correlates better with human judgment due to higher weight on recall than precision ( )	PP VHZ VBN VVN IN/that NP VVZ RBR IN JJ NN JJ TO JJR NN IN NN IN NN ( )	BackGround	SRelated	Neutral
07-1040_1	Recently , confusion network decoding for MT system combination has been proposed ( )	RB , NN NN VVG IN NP NN NN VHZ VBN VVN ( )	BackGround	GRelated	Neutral
07-1040_2	Powell's method ( ) is used to tune the system and feature weights simultaneously so as to optimize various automatic evaluation metrics on a development set	NP NN ( ) VBZ VVN TO VV DT NN CC NN NNS RB RB RB TO VV JJ JJ NN NNS IN DT NN NN	BackGround	GRelated	Neutral
07-1040_2	In this work , modified Powell's method as proposed by ( ) is used	IN DT NN , VVN NP NN IN VVN IN ( ) VBZ VVN	Fundamental	Basis	Neutral
07-1040_3	Six MT systems were combined: three (A ,C ,E) were phrase-based similar to ( ) , two (B ,D) were hierarchical similar to ( ) and one (F) was syntax-based similar to ( )	CD NP NNS VBD JJ CD NN NN NN VBD JJ JJ TO ( ) , CD NN NN VBD JJ JJ TO ( ) CC CD NN VBD JJ JJ TO ( )	Fundamental	Idea	Neutral
07-1040_4	Combination of speech recognition outputs is an example of this approach ( )	NN IN NN NN NNS VBZ DT NN IN DT NN ( )	BackGround	GRelated	Neutral
07-1040_7	Also , a more heuristic alignment method has been proposed in a different system combination approach ( )	RB , DT RBR JJ NN NN VHZ VBN VVN IN DT JJ NN NN NN ( )	BackGround	GRelated	Neutral
07-1040_9	In speech recognition , confusion network decoding ( ) has become widely used in system combination	IN NN NN , NN NN VVG ( ) VHZ VVN RB VVN IN NN NN	BackGround	GRelated	Neutral
07-1040_10	In ( ) , different word orderings are taken into account by training alignment models by considering all hypothesis pairs as a parallel corpus using GIZA++ ( )	IN ( ) , JJ NN NNS VBP VVN IN NN IN NN NN NNS IN VVG DT NN NNS IN DT JJ NN VVG NP ( )	BackGround	GRelated	Neutral
07-1040_10	Tuning is fully automatic , as opposed to ( ) where global system weights were set manually	VVG VBZ RB JJ , RB VVN TO ( ) WRB JJ NN NNS VBD VVN RB	BackGround	GRelated	Neutral
07-1040_10	Similar combination of multiple confusion networks was presented in ( )	JJ NN IN JJ NN NNS VBD VVN IN ( )	BackGround	SRelated	Neutral
07-1040_12	The same Powell's method has been used to estimate feature weights of a standard feature-based phrasal MT decoder in ( )	DT JJ NP NN VHZ VBN VVN TO VV NN NNS IN DT JJ JJ JJ NP NN IN ( )	BackGround	SRelated	Neutral
07-1040_13	The optimization of the system and feature weights may be carried out using -best lists as in ( )	DT NN IN DT NN CC NN NNS MD VB VVN IN VVG JJ NNS IN IN ( )	BackGround	SRelated	Neutral
07-1040_14	Currently , the most widely used automatic MT evaluation metric is the NIST BLEU-4 ( )	RB , DT RBS RB VVN JJ NP NN NN VBZ DT NP NP ( )	BackGround	SRelated	Neutral
07-1040_15	This work was extended in ( ) by introducing system weights for word confidences	DT NN VBD VVN IN ( ) IN VVG NN NNS IN NN NNS	BackGround	GRelated	Neutral
07-1040_15	In ( ) , simple score was assigned to the word coming from the th-best hypothesis	IN ( ) , JJ NN VBD VVN TO DT NN VVG IN DT JJ NN	BackGround	GRelated	Neutral
07-1040_15	In ( ) , the total confidence of the nth best confusion network hypothesis  , including NULL words , given the th source sentence  was given by where is the number of nodes in the confusion network for the source sentence  , is the number of translation systems , is the th system weight ,  c wn is the accumulated confidence for word produced by system between nodes and  , and is a weight for the number of NULL links along the hypothesis	IN ( ) , DT JJ NN IN DT JJ JJS NN NN NN , VVG NP NNS , VVN DT NN NN NN VBD VVN IN WRB VBZ DT NN IN NNS IN DT NN NN IN DT NN NN , VBZ DT NN IN NN NNS , VBZ DT NN NN NN , LS NN VBZ DT VVN NN IN NN VVN IN NN IN NNS CC , CC VBZ DT NN IN DT NN IN NP NNS IN DT NN	BackGround	SRelated	Neutral
07-1040_15	The improved system combination method was compared to a simple confusion network decoding without system weights and the method proposed in ( ) on the Arabic to English and Chinese to English NIST MT05 tasks	DT VVN NN NN NN VBD VVN TO DT JJ NN NN VVG IN NN NNS CC DT NN VVN IN ( ) IN DT NP TO NP CC NP TO NP NP NP NNS	Compare	Compare	Neutral
07-1040_15	Compared to the baseline from ( ) , the new method improves the BLEU scores significantly	VVN TO DT NN IN ( ) , DT JJ NN VVZ DT NP NNS RB	Compare	Compare	Negative
07-1040_16	In ensemble learning , a collection of simple classifiers is used to yield better performance than any single classifier; for example boosting ( )	IN NN NN , DT NN IN JJ NNS VBZ VVN TO VV JJR NN IN DT JJ NN IN NN VVG ( )	BackGround	GRelated	Neutral
07-1040_17	A modified Levenshtein alignment allowing shifts as in computation of the translation edit rate (TER) ( ) was used to align hy-potheses in ( )	DT JJ NP NN VVG NNS IN IN NN IN DT NN VV NN NN ( ) VBD VVN TO VV NNS IN ( )	BackGround	GRelated	Neutral
07-1040_17	Minimum Bayes risk (MBR) was used to choose the skeleton in ( )	JJ NP NN NN VBD VVN TO VV DT NN IN ( )	BackGround	GRelated	Neutral
07-1040_17	This is equivalent to minimum Bayes risk decoding with uniform posterior probabilities ( )	DT VBZ JJ TO JJ NP NN VVG IN JJ JJ NNS ( )	BackGround	SRelated	Neutral
07-1040_17	It has been found that multiple hypotheses from each system may be used to improve the quality of the combination output ( )	PP VHZ VBN VVN IN/that JJ NNS IN DT NN MD VB VVN TO VV DT NN IN DT NN NN ( )	BackGround	SRelated	Neutral
07-1040_18	Translation edit rate (TER) ( ) has been proposed as more intuitive evaluation metric since it is based on the rate of edits required to transform the hypothesis into the reference	NN VV NN NN ( ) VHZ VBN VVN RB RBR JJ NN JJ IN PP VBZ VVN IN DT NN IN VVZ VVN TO VV DT NN IN DT NN	BackGround	SRelated	Neutral
07-1040_18	However , this would require time consuming evaluations such as human mediated TER post-editing ( )	RB , DT MD VV NN NN NNS JJ IN JJ VVN NN NN ( )	BackGround	MRelated	Neutral
07-1041_1	The TnT tagger ( ) and the TreeTagger ( ) are used for tagging and lemmatization	DT NP NN ( ) CC DT NP ( ) VBP VVN IN VVG CC NN	Fundamental	Basis	Neutral
07-1041_2	Motivated by the theoretical work by Chafe ( ) and Jacobs ( ) , we view the VF as the place for elements which modify the situation described in the sentence , i.e	VVN IN DT JJ NN IN NP ( ) CC NP ( ) , PP VVP DT NP IN DT NN IN NNS WDT VV DT NN VVN IN DT NN , NN	Fundamental	Idea	Neutral
07-1041_4	Finally , the articles are parsed with the CDG dependency parser ( )	RB , DT NNS VBP VVN IN DT NP NN NN ( )	Fundamental	Basis	Neutral
07-1041_5	The preferences summarized below have mo-tivated our choice of features: constituents in the nominative case precede those in other cases , and dative constituents often precede those in the accusative case ( ); the verb arguments' order depends on the verb's  subcategorization properties ( ); constituents with a definite article precede those with an indefinite one ( ); pronominalized constituents precede non-pronominalized ones ( ); animate referents precede inanimate ones ( ); short constituents precede longer ones ( ); the preferred topic position is right after the verb ( ); the initial position is usually occupied by scene-setting elements and topics ( )	DT NNS VVD RB VH VVN PP$ NN IN JJ NNS IN DT JJ NN VV DT IN JJ NNS , CC JJ NNS RB VV DT IN DT JJ NN ( JJ NN NN NN NN VVZ IN DT JJ NN NNS ( JJ NNS IN DT JJ NN VV DT IN DT JJ CD ( NN VVD NNS VV JJ NNS ( JJ NN NNS VV JJ NNS ( JJ NN NNS VV JJR NNS ( JJ NN JJ NN NN VBZ RB IN DT NN ( JJ NN JJ NN VBZ RB VVN IN NN NNS CC NNS ( NN	BackGround	GRelated	Neutral
07-1041_6	The sentence-initial position , which in German is the VF , has been shown to be cognitively more prominent than other positions ( )	DT JJ NN , WDT IN JJ VBZ DT NP , VHZ VBN VVN TO VB RB RBR JJ IN JJ NNS ( )	BackGround	SRelated	Neutral
07-1041_7	Inspired by the findings of the Prague School ( ) and Systemic Functional Linguistics ( ) , they focus on the role that information structure plays in constituent ordering	VVN IN DT NNS IN DT NP NP ( ) CC NP NP NP ( ) , PP VVP IN DT NN IN/that NN NN VVZ IN NN VVG	BackGround	GRelated	Neutral
07-1041_8	Harbusch et al. ( ) present a generation workbench , which has the goal of producing not the most appropriate order , but all grammatical ones	NP NP NP ( ) VV DT NN NN , WDT VHZ DT NN IN VVG RB DT RBS JJ NN , CC DT JJ NNS	BackGround	GRelated	Neutral
07-1041_10	We suppose that this dificulty comes from the double function of the initial position which can either introduce the ad-dressation topic , or be the scene- or frame-setting position ( )	PP VVP IN/that DT NN VVZ IN DT JJ NN IN DT JJ NN WDT MD RB VV DT NN NN , CC VB DT NN CC NN NN ( )	BackGround	MRelated	Neutral
07-1041_11	We hypothesize that the reasons which bring a constituent to the VF are different from those which place it , say , to the beginning of the MF , for the order in the MF has been shown to be relatively rigid ( )	PP VVP IN/that DT NNS WDT VVP DT NN TO DT NP VBP JJ IN DT WDT VVP PP , VVP , TO DT NN IN DT NP , IN DT NN IN DT NP VHZ VBN VVN TO VB RB JJ ( )	BackGround	SRelated	Neutral
07-1041_14	Since our learner treats all values as nominal , we discretized the values of dep and len with a C4.5 classifier ( )	IN PP$ NN VVZ DT NNS IN JJ , PP VVD DT NNS IN NNS CC NNS IN DT NP NN ( )	Fundamental	Basis	Neutral
07-1041_15	Kruijff et al. ( ) describe an architecture which supports generating the appropriate word order for different languages	NP NP NP ( ) VV DT NN WDT VVZ VVG DT JJ NN NN IN JJ NNS	BackGround	GRelated	Neutral
07-1041_16	Kruijff-Korbayova et al. ( ) address the task of word order generation in the same vein	NP NP NP ( ) VV DT NN IN NN NN NN IN DT JJ NN	BackGround	GRelated	Neutral
07-1041_18	Similar to Langkilde & Knight ( ) we utilize statistical methods	JJ TO NP CC NP ( ) PP VV JJ NNS	Fundamental	Idea	Neutral
07-1041_19	Kendall's t , which has been used for evaluating sentence ordering tasks ( ) , is the second metric we use	NP NN , WDT VHZ VBN VVN IN VVG NN VVG NNS ( ) , VBZ DT JJ NN PP VVP	Fundamental	Basis	Neutral
07-1041_20	E.g. , in text-to-text generation ( ) , new sentences are fused from dependency structures of input sentences	FW , IN NN NN ( ) , JJ NNS VBP VVN IN NN NNS IN NN NNS	BackGround	GRelated	Neutral
07-1041_22	Ringger et al. ( ) aim at regenerating the order of constituents as well as the order within them for German and French technical manuals	NP NP NP ( ) NN IN VVG DT NN IN NNS RB RB IN DT NN IN PP IN JJ CC JJ JJ NNS	BackGround	GRelated	Neutral
07-1041_22	Similar to Ringger et al. ( ) , we find the order with the highest probability conditioned on syntactic and semantic categories	JJ TO NP NP NP ( ) , PP VVP DT NN IN DT JJS NN VVN IN JJ CC JJ NNS	Fundamental	Idea	Neutral
07-1041_22	Apart from acc and t , we also adopt the metrics used by Uchimoto et al. ( ) and Ringger et al. ( )	RB IN NN CC NN , PP RB VV DT NNS VVN IN NP NP NP ( ) CC NP NP NP ( )	Fundamental	Basis	Neutral
07-1041_22	According to the inv metric , our results are considerably worse than those reported by Ringger et al. ( )	VVG TO DT NN JJ , PP$ NNS VBP RB JJR IN DT VVN IN NP NP NP ( )	Compare	Compare	Positive
07-1041_26	We retrained our system on a corpus of newspaper articles ( ) which is manually annotated but encodes no semantic knowledge	PP VVN PP$ NN IN DT NN IN NN NNS ( ) WDT VBZ RB VVN CC VVZ DT JJ NN	Fundamental	Basis	Neutral
07-1041_27	The work of Uchimoto et al. ( ) is done on the free word order language Japanese	DT NN IN NP NP NP ( ) VBZ VVN IN DT JJ NN NN NN JJ	BackGround	GRelated	Neutral
07-1041_27	For the fourth baseline (UCHIMOTO) , we utilized a maximum entropy learner (OpenNLP 8) and reim-plemented the algorithm of Uchimoto et al. ( )	IN DT JJ NN NN , PP VVD DT JJ NN NN NN JJ CC VVD DT NN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1041_28	Uszkoreit ( ) addresses the problem from a mostly grammar-based perspective and suggests weighted constraints , such as [+nom] -< [+dat] , [+pro] -< [-pro] , [-focus] -< [+focus] , etc	NP ( ) VVZ DT NN IN DT RB JJ NN CC VVZ JJ NNS , JJ IN JJ JJ NN , JJ JJ NN , JJ JJ NN , FW	BackGround	GRelated	Neutral
07-1041_29	Unlike overgeneration approaches ( ) which select the best of all possible outputs ours is more efficient , because we do not need to generate every permutation	IN NN NNS ( ) WDT VVP DT JJS IN DT JJ NNS PP VBZ RBR JJ , IN PP VVP RB VV TO VV DT NN	Compare	Compare	Neutral
07-1042_0	It also compares reasonably with other more recent evaluations ( ) which derive their input data from the penn Treebank by transforming each sentence tree into a format suitable for the realiser ( )	PP RB VVZ RB IN JJ JJR JJ NNS ( ) WDT VV PP$ NN NNS IN DT NP NP IN VVG DT NN NN IN DT NN JJ IN DT NN ( )	BackGround	SRelated	Neutral
07-1042_0	For instance , ( ) reports that the implementation of such a processor for Surge was the most time consuming part of the evaluation with the resulting component containing 4000 lines of code and 900 rules	IN NN , ( ) VVZ IN/that DT NN IN PDT DT NN IN NN VBD DT JJS NN NN NN IN DT NN IN DT VVG NN VVG CD NNS IN NN CC CD NNS	BackGround	GRelated	Neutral
07-1042_1	The realiser presented here differs in mainly two ways from existing reversible realisers such as ( )'s CCG system or the HPSG ERG based realiser ( )	DT NN VVD RB VVZ IN RB CD NNS IN VVG JJ NNS JJ IN ( NP NP NN CC DT NP NP VVN NN ( )	Compare	Compare	Neutral
07-1042_2	The reason for this is that the grammar is compiled from a higher level description where tree fragments are first encapsulated into so-called classes and then explicitly combined (by inheritance , conjunction and disjunction) to produce the grammar elementary trees (cf. ( ))	DT NN IN DT VBZ IN/that DT NN VBZ VVN IN DT JJR NN NN WRB NN NNS VBP RB VVN IN JJ NNS CC RB RB VVN JJ NN , NN CC NN TO VV DT NN JJ NNS JJ ( NN	BackGround	SRelated	Neutral
07-1042_3	Thus for instance , both REALPRO ( ) and Surge ( ) assume that the input associates semantic literals with low level syntactic and lexical information mostly leaving the realiser to just handle inflection , word order , insertion of grammatical words and agreement	RB IN NN , CC NP ( ) CC NN ( ) VV IN/that DT NN NNS JJ NNS IN JJ NN JJ CC JJ NN RB VVG DT NN TO RB VV NN , NN NN , NN IN JJ NNS CC NN	BackGround	GRelated	Neutral
07-1042_4	To associate semantic representations with natural language expressions , the FTAG is modified as proposed in ( )	TO NN JJ NNS IN JJ NN NNS , DT NP VBZ VVN IN VVN IN ( )	Fundamental	Idea	Neutral
07-1042_5	The proposal draws on ideas from ( ) and aims to determine whether for a given input (a set of TAG elementary trees whose semantics equate the input semantics) , syntactic requirements and resources cancel out	DT NN VVZ IN NNS IN ( ) CC VVZ TO VV IN IN DT VVN NN NN VVN IN NP JJ NNS WP$ NNS VV DT NN NN , JJ NNS CC NNS VV RP	Fundamental	Idea	Neutral
07-1042_5	It could be used for instance , in combination with the parser and the semantic construction module described in ( ) , to support textual entailment recognition or answer detection in question answering	PP MD VB VVN IN NN , IN NN IN DT NN CC DT JJ NN NN VVN IN ( ) , TO VV JJ NN NN CC NN NN IN NN NN	BackGround	GRelated	Neutral
07-1042_7	We rely on these features to associate one and the same semantic to large sets of trees denoting semantically equivalent but syntactically distinct configurations (cf. ( ))	PP VVP IN DT NNS TO NN CD CC DT JJ JJ TO JJ NNS IN NNS VVG RB JJ CC RB JJ NNS JJ ( NN	Fundamental	Basis	Neutral
07-1042_8	The basic surface realisation algorithm used is a bottom up , tabular realisation algorithm ( ) optimised for TAGs	DT JJ NN NN NN VVN VBZ DT NN RB , JJ NN NN ( ) VVN IN NP	Fundamental	Basis	Positive
07-1042_12	Similarly , KPML ( ) assumes access to ideational , interpersonal and textual information which roughly corresponds to semantic , mood/voice , theme/rheme and focus/ground information	RB , NP ( ) VVZ NN TO JJ , JJ CC JJ NN WDT RB VVZ TO JJ , NNS , NN CC NN NN	BackGround	SRelated	Neutral
07-1042_13	In order to ensure this determinism , NLG geared realisers generally rely on theories of grammar which systematically link form to function such as systemic functional grammar (SFG , ( )) and , to a lesser extent , Meaning Text Theory (MTT , ( ))	IN NN TO VV DT NN , NP VVN NNS RB VVP IN NNS IN NN WDT RB VVP NN TO VV JJ IN JJ JJ NN NN , ( NN CC , TO DT JJR NN , NP NP NP NP , ( NN	BackGround	GRelated	Neutral
07-1042_14	First , the paraphrase figures might seem low wrt to e.g. , work by ( ) which mentions several thousand outputs for one given input and an average number of realisations per input varying between 85.7 and 102.2	RB , DT VVP NNS MD VV JJ NN TO FW , VV IN ( ) WDT VVZ JJ CD NNS IN CD VVN NN CC DT JJ NN IN NNS IN NN VVG IN CD CC CD	BackGround	SRelated	Neutral
07-1042_14	This does not seem to be the case in ( )'s approach where the count seems to include all sentences associated by the grammar with the input semantics	DT VVZ RB VV TO VB DT NN IN ( NNS VVP WRB DT NN VVZ TO VV DT NNS VVN IN DT NN IN DT NN NNS	BackGround	SRelated	Neutral
07-1042_15	A Feature-based TAG (FTAG , ( )) consists of a set of (auxiliary or initial) elementary trees and of two tree composition operations: substitution and ad-junction	DT JJ NP NN , ( NN VVZ IN DT NN IN JJ CC JJ JJ NNS CC IN CD NN NN NN NN CC NN	BackGround	SRelated	Neutral
07-1042_17	A first possibility would be to draw on ( )'s proposal and compute the enriched input based on the traversal of a systemic network	DT JJ NN MD VB TO VV IN ( NNS NN CC VV DT VVN NN VVN IN DT JJ IN DT JJ NN	BackGround	SRelated	Neutral
07-1042_17	Thus for instance , ( ) resorts to ad hoc "mapping tables" to associate substitution nodes with semantic indices and "fr-nodes" to constrain adjunction to the correct nodes	RB IN NN , ( ) VVZ TO FW FW NN NN TO NN NN NNS IN JJ NN CC NN TO VV NN TO DT JJ NNS	BackGround	GRelated	Neutral
07-1043_0	While there have been previous systems that encode generation as planning ( ) , our approach is distinguished from these systems by its focus on the grammatically specified contributions of each individual word (and the TAG tree it anchors) to syntax , semantics , and local pragmatics ( )	IN EX VHP VBN JJ NNS WDT VVP NN IN NN ( ) , PP$ NN VBZ VVN IN DT NNS IN PP$ NN IN DT RB VVN NNS IN DT JJ NN NN DT NP NN PP JJ TO NN , NNS , CC JJ NNS ( )	BackGround	GRelated	Neutral
07-1043_1	It also allows us to benefit from the past and ongoing advances in the performance of off-the-shelf planners ( )	PP RB VVZ PP TO VV IN DT JJ CC JJ NNS IN DT NN IN NN NNS ( )	BackGround	GRelated	Neutral
07-1043_3	Unlike some approaches ( ) , we do not have to distinguish between generating NPs and expressions of other syntactic categories	IN DT NNS ( ) , PP VVP RB VH TO VV IN VVG NNS CC NNS IN JJ JJ NNS	Compare	Compare	Neutral
07-1043_4	The context set of an intended referent is the set of all individuals that the hearer might possibly confuse it with ( )	DT NN VVN IN DT JJ NN VBZ DT NN IN DT NNS IN/that DT NN MD RB VV PP IN ( )	BackGround	SRelated	Neutral
07-1043_5	It is based on the well-known STRIPS language ( )	PP VBZ VVN IN DT JJ NNS NN ( )	BackGround	SRelated	Positive
07-1043_9	The grammar formalism we use here is that of lex-icalized tree-adjoining grammars (LTAG; Joshi and Schabes ( ))	DT NN NN PP VVP RB VBZ IN/that IN JJ NN NNS JJ NP CC NP ( NN	Fundamental	Basis	Neutral
07-1043_11	In order to use the planner as a surface realization algorithm for TAG along the lines of Koller and Striegnitz ( ) , we attach semantic content to each elementary tree and require that the sentence achieves a certain communicative goal	IN NN TO VV DT NN IN DT NN NN NN IN NP IN DT NNS IN NP CC NP ( ) , PP VVP JJ NN TO DT JJ NN CC VVP IN/that DT NN VVZ DT JJ JJ NN	Fundamental	Basis	Neutral
07-1043_11	However , this problem is NP-complete , by reduction of Hamiltonian Cycle - unsurprisingly , given that it encompasses realization , and the very similar realization problem in Koller and Striegnitz ( ) is NP-hard	RB , DT NN VBZ JJ , IN NN IN NP NP : RB , VVN IN/that PP VVZ NN , CC DT RB JJ NN NN IN NP CC NP ( ) VBZ NP	BackGround	SRelated	Neutral
07-1043_12	PDDL ( ) is the standard input language for modern planning systems	NP ( ) VBZ DT JJ NN NN IN JJ NN NNS	BackGround	SRelated	Neutral
07-1043_13	In a scenario that involves multiple rabbits , multiple hats , and multiple individuals that are inside other individuals , but only one pair of a rabbit r inside a hat h , the expression "X takes the rabbit from the hat" is sufficient to refer uniquely to r and h ( )	IN DT NN WDT VVZ JJ NNS , JJ NNS , CC JJ NNS WDT VBP RB JJ NNS , CC RB CD NN IN DT NN NN IN DT NN NN , DT NN NN VVZ DT NN IN DT NN VBZ JJ TO VV RB TO NN CC NN ( )	BackGround	SRelated	Neutral
07-1043_14	We share these advantages with systems such as SPUD ( )	PP VVP DT NNS IN NNS JJ IN NP ( )	Fundamental	Basis	Neutral
07-1043_14	This makes our encoding more direct and transparent than those in work like Thomason and Hobbs ( ) and Stone et al. ( )	DT VVZ PP$ VVG RBR JJ CC JJ IN DT IN NN IN NP CC NP ( ) CC NP NP NP ( )	Compare	Compare	Neutral
07-1043_14	We follow Stone et al. ( ) in formalizing the semantic content of a lexicalized elementary tree t as a finite set of atoms; but unlike in earlier approaches , we use the semantic roles in t as the arguments of these atoms	PP VVP NP NP NP ( ) IN VVG DT JJ NN IN DT JJ JJ NN NN IN DT JJ NN IN NN CC IN IN JJR NNS , PP VVP DT JJ NNS IN NN IN DT NNS IN DT NNS	Fundamental	Idea	Neutral
07-1043_14	The three pragmatic predicates that we will use here are hearer-new , indicating that the hearer does not know about the existence of an individual and can't infer it ( ) , hearer-old for the opposite , and contextset	DT CD JJ NNS IN/that PP MD VV RB VBP JJ , VVG IN/that DT NN VVZ RB VV IN DT NN IN DT NN CC NN VVP PP ( ) , JJ IN DT NN , CC NN	BackGround	SRelated	Neutral
07-1043_14	In addition to the semantic content , we equip every elementary tree in the grammar with a semantic requirement and a pragmatic condition ( )	IN NN TO DT JJ NN , PP VV DT JJ NN IN DT NN IN DT JJ NN CC DT JJ NN ( )	Fundamental	Basis	Neutral
07-1044_0	Supertag This is a variant of the approach above , but using supertags ( ) instead of PoS tags	NP NP VBZ DT NN IN DT NN IN , CC VVG NNS ( ) RB IN NP NNS	BackGround	SRelated	Neutral
07-1044_1	For example , the metrics proposed in Bangalore et al. ( ) , such as Simple Accuracy and Generation Accuracy , measure changes with respect to a reference string based on the idea of string-edit distance	IN NN , DT NNS VVN IN NP NP NP ( ) , JJ IN NP NP CC NP NP , NN NNS IN NN TO DT NN NN VVN IN DT NN IN NN NN	BackGround	GRelated	Neutral
07-1044_2	The judges were then presented with the 50 sentences in random order , and asked to score the sentences according to their own scale , as in magnitude estimation ( ); these scores were then normalised in the range [0 ,1]	DT NNS VBD RB VVN IN DT CD NNS IN JJ NN , CC VVD TO NN DT NNS VVG TO PP$ JJ NN , RB IN NN NN ( NN DT NNS VBD RB VVN IN DT NN NN NN	Fundamental	Idea	Neutral
07-1044_3	Regarding the interpretation of the absolute value of (Pearson's) correlation coefficients , both here and in the rest of the paper , we adopt Cohen's scale ( ) for use in human judgements , given in Table 1; we use this as most of this work is to do with human judgements of fluency	VVG DT NN IN DT JJ NN IN JJ NN NNS , CC RB CC IN DT NN IN DT NN , PP VVP NP NN ( ) IN NN IN JJ NNS , VVN IN JJ JJ PP VVP DT RB JJS IN DT NN VBZ TO VV IN JJ NNS IN NN	Fundamental	Basis	Neutral
07-1044_4	Those chosen were the Connexor parser , 2 the Collins parser ( ) , and the Link Grammar parser ( )	DT VVN VBD DT NP NN , LS DT NP NN ( ) , CC DT VVP NP NN ( )	Fundamental	Basis	Neutral
07-1044_6	For example , in statistical MT the translation model and the language model are treated separately , characterised as faithfulness and fluency respectively (as in the treatment in Jurafsky and Martin ( ))	IN NN , IN JJ NP DT NN NN CC DT NN NN VBP VVN RB , VVN IN NN CC NN RB NNS IN DT NN IN NP CC NP ( NN	BackGround	GRelated	Neutral
07-1044_7	A neat solution to poor sentence-level evaluation proposed by Kulesza and Shieber ( ) is to use a Support Vector Machine , using features such as word error rate , to estimate sentence-level translation quality	DT JJ NN TO JJ JJ NN VVN IN NP CC NP ( ) VBZ TO VV DT NP NP NP , VVG NNS JJ IN NN NN NN , TO VV JJ NN NN	BackGround	SRelated	Neutral
07-1044_8	Bleu ( ) is a canonical example: in matching n-grams in a candidate translation text with those in a reference text , the metric measures faithfulness by counting the matches , and fluency by implicitly using the reference n-grams as a language model	NP ( ) VBZ DT JJ NN IN VVG NNS IN DT NN NN NN IN DT IN DT NN NN , DT JJ NNS NN IN VVG DT NNS , CC NN IN RB VVG DT NN NNS IN DT NN NN	BackGround	GRelated	Neutral
07-1044_9	Quite a different idea was suggested in Wan et al. ( ) , of using the grammatical judgement of a parser to assess fluency , giving a measure independent of the language model used to generate the text	PDT DT JJ NN VBD VVN IN NP NP NP ( ) , IN VVG DT JJ NN IN DT NN TO VV NN , VVG DT NN JJ IN DT NN NN VVN TO VV DT NN	BackGround	GRelated	Neutral
07-1044_9	In terms of automatic evaluation , we are not aware of any technique that measures only fluency or similar characteristics , ignoring content , apart from that of Wan et al. ( )	IN NNS IN JJ NN , PP VBP RB JJ IN DT NN WDT VVZ JJ NN CC JJ NNS , VVG NN , RB IN DT IN NP NP NP ( )	BackGround	SRelated	Neutral
07-1044_9	The consistency and magnitude of the first three parser metrics , however , lends support to the idea of Wan et al. ( ) to use something like these as indicators of generated sentence fluency	DT NN CC NN IN DT JJ CD NN NNS , RB , VVZ NN TO DT NN IN NP NP NP ( ) TO VV NN IN DT IN NNS IN VVN NN NN	BackGround	SRelated	Neutral
07-1044_10	Similarly , the ultrasummarisa-tion model of Witbrock and Mittal ( ) consists of a content model , modelling the probability that a word in the source text will be in the summary , and a language model	RB , DT NN NN IN NP CC NP ( ) VVZ IN DT JJ NN , VVG DT NN IN/that DT NN IN DT NN NN MD VB IN DT NN , CC DT NN NN	BackGround	GRelated	Neutral
07-1044_10	In this model we violate the Markov assumption of independence in much the same way as Witbrock and Mittal ( ) in their combination of content and language model probabilities , by backtracking at every state in order to discourage repeated words and avoid loops	IN DT NN PP VVP DT NP NN IN NN IN RB DT JJ NN IN NP CC NP ( ) IN PP$ NN IN NN CC NN NN NNS , IN VVG IN DT NN IN NN TO VV JJ NNS CC VV NNS	Fundamental	Idea	Neutral
07-1044_11	Zajic et al. ( ) use similar scales for summarisation	NP NP NP ( ) VV JJ NNS IN NN	BackGround	GRelated	Neutral
07-1045_0	Coreference resolution on text datasets is well-studied (e.g. , ( ))	NN NN IN NN NNS VBZ JJ NN , ( NN	BackGround	GRelated	Neutral
07-1045_0	We employ a set of verbal features that is similar to the features used by state-of-the-art coreference resolution systems that operate on text (e.g. , ( ))	PP VVP DT NN IN JJ NNS WDT VBZ JJ TO DT NNS VVN IN JJ NN NN NNS WDT VVP IN NN NN , ( NN	Fundamental	Basis	Positive
07-1045_0	Evaluation metric Coreference resolution is often performed in two phases: a binary classification phase , in which the likelihood of corefer-ence for each pair of noun phrases is assessed; and a partitioning phase , in which the clusters of mutually-coreferring NPs are formed , maximizing some global criterion ( )	NN JJ NP NN VBZ RB VVN IN CD NN DT JJ NN NN , IN WDT DT NN IN NN IN DT NN IN NN NNS VBZ JJ CC DT VVG NN , IN WDT DT NNS IN VVG NP VBP VVN , VVG DT JJ NN ( )	BackGround	SRelated	Neutral
07-1045_0	The verbal features that we have included are a representative sample from the literature (e.g. , ( ))	DT JJ NNS IN/that PP VHP VVN VBP DT JJ NN IN DT NN NN , ( NN	Fundamental	Basis	Positive
07-1045_1	also consider training separate classifiers and combining their posteriors , either through weighted addition or multiplication; this is sometimes called "late fusion." Late fusion is also employed for gesture-speech combination in ( )	RB VV VVG JJ NNS CC VVG PP$ NNS , CC IN JJ NN CC NN DT VBZ RB VVN JJ JJ JJ NN VBZ RB VVN IN NN NN IN ( )	BackGround	GRelated	Neutral
07-1045_2	All features are computed from hand and body pixel coordinates , which are obtained via computer vision; our vision system is similar to ( )	DT NNS VBP VVN IN NN CC NN NN VVZ , WDT VBP VVN IN NN NN PP$ NN NN VBZ JJ TO ( )	Fundamental	Idea	Neutral
07-1045_3	The continuous-valued features were binned using a supervised technique ( )	DT JJ NNS VBD VVN VVG DT JJ NN ( )	Fundamental	Basis	Neutral
07-1045_4	While people have little difficulty distinguishing between meaningful gestures and irrelevant hand motions (e.g. , self-touching , adjusting glasses) ( ) , NLP systems may be confused by such seemingly random movements	IN NNS VHP JJ NN VVG IN JJ NNS CC JJ NN NNS JJ , NN , VVG NN ( ) , NP NNS MD VB VVN IN JJ RB JJ NNS	BackGround	GRelated	Neutral
07-1045_5	Markable noun phrases - those that are permitted to participate in coreference relations - were annotated by the first author , in accordance with the MUC task definition ( )	JJ NN NNS : DT WDT VBP VVN TO VV IN NN NNS : VBD VVN IN DT JJ NN , IN NN IN DT NP NN NN ( )	Fundamental	Basis	Neutral
07-1045_6	To measure the similarity between gesture trajectories , we use dynamic time warping ( ) , which gives a similarity metric for temporal data that is invariant to speed	TO VV DT NN IN NN NNS , PP VVP JJ NN VVG ( ) , WDT VVZ DT NN JJ IN JJ NNS WDT VBZ JJ TO VV	Fundamental	Basis	Neutral
07-1045_7	In addition , verbal language is different when used in combination with meaningful non-verbal communication than when it is used unimodally ( )	IN NN , JJ NN VBZ JJ WRB VVN IN NN IN JJ JJ NN IN WRB PP VBZ VVN RB ( )	BackGround	SRelated	Neutral
07-1045_7	Kehler finds that fully-specified noun phrases are less likely to receive multimodal support ( )	NP VVZ IN/that JJ NN NNS VBP RBR JJ TO VV JJ NN ( )	BackGround	GRelated	Neutral
07-1045_7	Last , we note that NPs with adjectival modifiers were assigned negative weights , supporting the finding of ( ) that fully-specified NPs are less likely to receive multimodal support	RB , PP VVP IN/that NP IN JJ NNS VBD VVN JJ NNS , VVG DT NN IN ( ) IN/that JJ NNS VBP RBR JJ TO VV JJ NN	BackGround	SRelated	Neutral
07-1045_8	Experiments in both ( ) and ( ) find no conclusive winner among early fusion , additive late fusion , and multiplicative late fusion	NNS IN DT ( ) CC ( ) VV DT JJ NN IN JJ NN , JJ JJ NN , CC JJ JJ NN	BackGround	GRelated	Neutral
07-1045_9	JS-div reports the Jensen-Shannon divergence , a continuous-valued feature used to measure the similarity in cluster assignment probabilities between the two gestures ( )	NP VVZ DT NP NN , DT JJ NN VVN TO VV DT NN IN NN NN NNS IN DT CD NNS ( )	BackGround	GRelated	Neutral
07-1045_10	The objective function (Equation 1) is optimized using a Java implementation of L-BFGS , a quasiNewton numerical optimization technique ( )	DT JJ NN NN JJ VBZ VVN VVG DT NP NN IN NP , DT NP JJ NN NN ( )	Fundamental	Basis	Neutral
07-1045_11	However , non-verbal modalities are often noisy , and their interactions with speech are complex ( )	RB , JJ NNS VBP RB JJ , CC PP$ NNS IN NN VBP JJ ( )	BackGround	GRelated	Neutral
07-1045_11	Our non-verbal features attempt to capture similarity between the speaker's hand gestures; similar gestures are thought to suggest semantic similarity ( )	PP$ JJ NNS VVP TO VV NN IN DT JJ NN NN JJ NNS VBP VVN TO VV JJ NN ( )	Fundamental	Idea	Neutral
07-1045_11	Euclidean distance captures cases in which the speaker is performing a gestural "hold" in roughly the same location ( )	JJ NN VVZ NNS IN WDT DT NN VBZ VVG DT JJ NN IN RB DT JJ NN ( )	BackGround	SRelated	Neutral
07-1045_11	Non-verbal meta features Research on gesture has shown that semantically meaningful hand motions usually take place away from "rest position ," which is located at the speaker's lap or sides ( )	JJ NN VVZ NP IN NN VHZ VVN IN/that RB JJ NN VVZ RB VV NN RB IN NN NN NN WDT VBZ VVN IN DT NNS VVP CC NNS ( )	BackGround	GRelated	Neutral
07-1045_11	Indeed , the psychology literature describes a finite-state model of gesture , proceeding from "preparation ," to "stroke ," "hold ," and then "retraction" ( )	RB , DT NN NN VVZ DT JJ NN IN NN , VVG IN NN NN TO JJ JJ NN NN CC RB JJ ( )	BackGround	GRelated	Neutral
07-1045_12	Verbal meta features Meaningful gesture has been shown to be more frequent when the associated speech is ambiguous ( )	JJ NN VVZ NN NN VHZ VBN VVN TO VB RBR JJ WRB DT VVN NN VBZ JJ ( )	BackGround	GRelated	Neutral
07-1045_13	The use of hidden variables in a conditionally-trained model follows ( )	DT NN IN JJ NNS IN DT JJ NN VVZ ( )	Fundamental	Idea	Neutral
07-1045_14	For example , Shriberg et al. ( ) explore the use of prosodic features for sentence and topic segmentation	IN NN , NP NP NP ( ) VV DT NN IN JJ NNS IN NN CC NN NN	BackGround	GRelated	Neutral
07-1045_14	While more flexible than the interpolation techniques described in ( ) , training modality-specific classifiers separately is still suboptimal compared to training them jointly , because independent training of the modality-specific classifiers forces them to account for data that they cannot possibly explain	IN RBR JJ IN DT NN NNS VVN IN ( ) , VVG JJ NNS RB VBZ RB JJ VVN TO VVG PP RB , IN JJ NN IN DT JJ NNS VVZ PP TO VV IN NNS IN/that PP MD RB VV	BackGround	GRelated	Neutral
07-1045_15	Toyama and Horvitz ( ) introduce a Bayesian network approach to modality combination for speaker identification	NP CC NP ( ) VV DT NP NN NN TO NN NN IN NN NN	BackGround	GRelated	Neutral
07-1046_0	Introduction With recent advances in spoken dialogue system technologies , researchers have turned their attention to more complex domains (e.g.tutoring ( ) , technical support ( ) , medication assistance ( ))	NN IN JJ NNS IN VVN NN NN NNS , NNS VHP VVN PP$ NN TO JJR JJ NNS NN ( ) , JJ NN ( ) , NN NN ( NN	BackGround	GRelated	Neutral
07-1046_2	Average (standard deviation) for objective metrics in the first problem Related work Discourse structure has been successfully used in non-interactive settings (e.g.understanding specific lexical and prosodic phenomena ( )  , natural language generation ( ) , essay scoring ( ) as well as in interactive settings (e.g.predictive/generative models of postural shifts ( ) , generation/interpretation of anaphoric expressions ( ) , performance modeling ( ))	JJ NN NN IN JJ NNS IN DT JJ NN JJ NN NP NN VHZ VBN RB VVN IN JJ NNS VVG JJ JJ CC JJ NNS ( ) , JJ NN NN ( ) , NN VVG ( ) RB RB IN IN JJ NNS JJ NNS IN JJ NNS ( ) , NN IN JJ NNS ( ) , NN NN ( NN	BackGround	GRelated	Positive
07-1046_4	Other visual improvements for dialogue-based computer tutors have been explored in the past (e.g.talking heads ( ))	JJ JJ NNS IN JJ NN NNS VHP VBN VVN IN DT JJ NN VVZ ( NN	BackGround	GRelated	Neutral
07-1046_5	This information is implicitly encoded in the intentional structure of a discourse as proposed in the Grosz & Sidner theory of discourse ( )	DT NN VBZ RB VVN IN DT JJ NN IN DT NN IN VVN IN DT NP CC NP NN IN NN ( )	Fundamental	Basis	Neutral
07-1046_5	3  The Navigation Map (NM) We use the Grosz & Sidner theory of discourse ( ) to inform our NM design	LS DT NP NP NN PP VVP DT NP CC NP NN IN NN ( ) TO VV PP$ NP NN	Fundamental	Basis	Neutral
07-1046_9	2 ITSPOKE ITSPOKE ( ) is a state-of-the-art tutoring spoken dialogue system for conceptual physics	CD JJ NNS ( ) VBZ DT JJ VVG VVN NN NN IN JJ NNS	BackGround	SRelated	Neutral
07-1046_10	Thus , interacting with such systems can be characterized by an increased user cognitive load associated with listening to often lengthy system turns and the need to integrate the current information to the discussion overall ( )	RB , VVG IN JJ NNS MD VB VVN IN DT VVN NN JJ NN VVN IN VVG TO RB JJ NN NNS CC DT NN TO VV DT JJ NN TO DT NN NN ( )	BackGround	GRelated	Neutral
07-1046_11	However , implementing the NM in a new domain requires little expertise as previous work has shown that nave users can reliably annotate the information needed for the NM ( )	RB , VVG DT NP IN DT JJ NN VVZ JJ NN IN JJ NN VHZ VVN IN/that NN NNS MD RB VV DT NN VVN IN DT NP ( )	BackGround	GRelated	Neutral
07-1046_13	While a somewhat similar graphical representation of the discourse structure has been explored in one previous study ( ) , to our knowledge we are the first to test its benefits (see Section 6)	IN DT RB JJ JJ NN IN DT NN NN VHZ VBN VVN IN CD JJ NN ( ) , TO PP$ NN PP VBP DT JJ TO VV PP$ NNS NN NN JJ	BackGround	SRelated	Neutral
07-1046_13	This theory has inspired several generic dialogue managers for spoken dialogue systems (e.g. ( ))	DT NN VHZ VVN JJ JJ NN NNS IN VVN NN NNS JJ ( NN	BackGround	GRelated	Neutral
07-1046_13	One related study is that of ( )	CD JJ NN VBZ IN/that IN ( )	BackGround	SRelated	Neutral
07-1046_15	Results for Q1-6 Questions Q1-6 were inspired by previous work on spoken dialogue system evaluation (e.g. ( )) and measure user's overall perception of the system	NNS IN NP NP NP VBD VVN IN JJ NN IN VVN NN NN NN NN ( NN CC NN NNS JJ NN IN DT NN	Fundamental	Idea	Neutral
07-1047_0	This situation is very similar to the training process of translation models in statistical machine translation ( ) , where parallel corpus is used to find the mappings between words from different languages by exploiting their co-occurrence patterns	DT NN VBZ RB JJ TO DT NN NN IN NN NNS IN JJ NN NN ( ) , WRB JJ NN VBZ VVN TO VV DT NNS IN NNS IN JJ NNS IN VVG PP$ NN NNS	Fundamental	Idea	Neutral
07-1047_0	   J^Pr(Wj |o k) = 1 , Vk j =1 This optimization problem can be solved by the EM algorithm ( )	NP NP NP SYM CD , NP NN NN DT NN NN MD VB VVN IN DT JJ NN ( )	BackGround	SRelated	Neutral
07-1047_1	Studies have also shown that eye gaze has a potential to improve resolution of underspecified referring expressions in spoken dialog systems ( ) and to disambiguate speech input ( )	NNS VHP RB VVN DT NN VVP VHZ DT NN TO VV NN IN JJ VVG NNS IN VVN NN NNS ( ) CC TO VV NN NN ( )	BackGround	GRelated	Positive
07-1047_2	Given the recent advances in eye tracking technology ( ) , integrating non-intrusive and high performance eye trackers with conversational interfaces becomes feasible	VVN DT JJ NNS IN NN VVG NN ( ) , VVG JJ CC JJ NN NN NNS IN JJ NNS VVZ JJ	BackGround	GRelated	Neutral
07-1047_3	Motivated by psycholinguistic studies ( ) and recent investigations on computational models for language acquisition and grounding ( ) , we are particularly interested in two unique questions related to multimodal conversational systems: (1) In a multimodal conversation that involves more complex tasks (e.g. , both user initiated tasks and system initiated tasks) , is there a reliable temporal alignment between eye gaze and spoken references so that the coupled inputs can be used for automated vocabulary acquisition and interpretation? (2) If such an alignment exists , how can we model this alignment and automatically acquire and interpret the vocabularies? To address the first question , we conducted an empirical study to examine the temporal relationships between eye fixations and their corresponding spoken references	VVN IN JJ NNS ( ) CC JJ NNS IN JJ NNS IN NN NN CC NN ( ) , PP VBP RB JJ IN CD JJ NNS VVN TO JJ JJ NN NN IN DT JJ NN WDT VVZ JJR JJ NNS JJ , DT NN VVD NNS CC NN VVN NN , VBZ RB DT JJ JJ NN IN NN VVP CC VVN NNS RB IN/that DT VVN NNS MD VB VVN IN JJ NN NN CC JJ NN IN PDT DT NN VVZ , WRB MD PP VV DT NN CC RB VV CC VV DT NN TO VV DT JJ NN , PP VVD DT JJ NN TO VV DT JJ NNS IN NN NNS CC PP$ JJ VVN NNS	Fundamental	Idea	Neutral
07-1047_3	Additionally , before speaking a word , the eyes usually move to the objects to be mentioned ( )	RB , IN VVG DT NN , DT NNS RB VVP TO DT NNS TO VB VVN ( )	BackGround	GRelated	Neutral
07-1047_4	Previous psycholinguistics studies have shown that the direction of gaze carries information about the focus of the user's attention ( )	JJ NN NNS VHP VVN IN/that DT NN IN NN VVZ NN IN DT NN IN DT NNS NN ( )	BackGround	GRelated	Neutral
07-1047_5	In research on multimodal interactive systems , recent work indicates that the speech and gaze integration patterns can be modeled reliably for individual users and therefore be used to improve multimodal system performances ( )	IN NN IN JJ JJ NNS , JJ NN VVZ IN/that DT NN CC VVP NN NNS MD VB VVN RB IN JJ NNS CC RB VB VVN TO VV JJ NN NNS ( )	BackGround	GRelated	Positive
07-1047_6	In addition , visual properties of the interface also affect user gaze behavior and thus influence the predication of attention ( ) based on eye gaze	IN NN , JJ NNS IN DT NN RB VV NN VVP NN CC RB VV DT NN IN NN ( ) VVN IN NN VVP	BackGround	SRelated	Neutral
07-1047_7	Recent work has shown that the effect of eye gaze in facilitating spoken language processing varies among different users ( )	JJ NN VHZ VVN IN/that DT NN IN NN VVP IN VVG VVN NN NN VVZ IN JJ NNS ( )	BackGround	GRelated	Neutral
07-1047_8	Recent studies have shown that multisensory information (e.g. , through vision and language processing) can be combined to effectively acquire words to their perceptually grounded objects in the environment ( )	JJ NNS VHP VVN IN/that JJ NN NN , IN NN CC NN NN MD VB VVN TO RB VV NNS TO PP$ RB VVN NNS IN DT NN ( )	BackGround	GRelated	Positive
07-1047_11	The perceived visual context influences spoken word recognition and mediates syntactic processing ( )	DT VVN JJ NN VVZ VVN NN NN CC VVZ JJ NN ( )	BackGround	GRelated	Neutral
07-1048_0	Figure 1 Multimodal interface on tablet In this paper we explore the application of multimodal interface technologies (See Andr ( ) for an overview) to the creation of more effective systems used to search and browse for entertainment content in the home	NN CD NP NN IN NN IN DT NN PP VVP DT NN IN JJ NN NNS NN NP ( ) IN DT NN TO DT NN IN JJR JJ NNS VVN TO VV CC VV IN NN NN IN DT NN	Fundamental	Basis	Neutral
07-1048_1	These interfaces are cumbersome and do not scale well as the range of content available increases ( )	DT NNS VBP JJ CC VVP RB VV RB IN DT NN IN JJ JJ NNS ( )	BackGround	GRelated	Negative
07-1048_2	An important advantage of speech is that it makes it easy to combine multiple constraints over multiple dimensions within a single query ( )	DT JJ NN IN NN VBZ IN/that PP VVZ PP JJ TO VV JJ NNS IN JJ NNS IN DT JJ NN ( )	BackGround	SRelated	Positive
07-1048_3	A number of previous systems have investigated the addition of unimodal spoken search queries to a graphical electronic program guide ( ); Goto et al. , 2003; Wittenburg et al. , 2006)	DT NN IN JJ NNS VHP VVN DT NN IN JJ VVN NN NNS TO DT JJ JJ NN NN ( NP NP CC NP , JJ NP NP NP , JJ	BackGround	GRelated	Neutral
07-1048_5	Others have gone beyond unimodal speech input and added multimodal commands combining speech with pointing ( )	NNS VHP VVN IN JJ NN NN CC VVD JJ NNS VVG NN IN VVG ( )	BackGround	GRelated	Neutral
07-1048_6	This develops and extends upon the multimodal architecture underlying the MATCH system ( )	DT VVZ CC VVZ IN DT JJ NN VVG DT NP NN ( )	Fundamental	Basis	Neutral
07-1048_7	Speech recognition results , pointing gestures made on the display , and handwritten inputs , are all passed to a multimodal understanding server which uses finite-state multimodal language proc-essing techniques ( ) to interpret and integrate the speech and gesture	NN NN NNS , VVG NNS VVN IN DT NN , CC JJ NNS , VBP RB VVN TO DT JJ NN NN WDT VVZ JJ JJ NN NN NNS ( ) TO VV CC VV DT NN CC NN	Fundamental	Basis	Neutral
07-1048_10	However , as also reported in previous work ( ) , recognition accuracy remains a serious problem	RB , IN RB VVN IN JJ NN ( ) , NN NN VVZ DT JJ NN	BackGround	GRelated	Neutral
07-1049_0	The past few years have seen considerable improvement in the performance of unsupervised parsers ( ) and , for the first time , unsupervised parsers have been able to improve on the right-branching heuristic for parsing English	DT JJ JJ NNS VHP VVN JJ NN IN DT NN IN JJ NNS ( ) CC , IN DT JJ NN , JJ NNS VHP VBN JJ TO VV IN DT NN JJ IN VVG NP	BackGround	GRelated	Neutral
07-1049_0	Some of these subsets were used for scoring in ( )	DT IN DT NNS VBD VVN IN VVG IN ( )	BackGround	SRelated	Neutral
07-1049_0	Table 1 gives two baselines and the parsing results for WSJ10 , WSJ40 , Negra10 and Negra40 for recent unsupervised parsing algorithms: CCM and DMV+CCM ( ) , U-DOP ( ) and UML-DOP ( )	NN CD VVZ CD NNS CC DT VVG NNS IN NP , NP , NP CC NP IN JJ JJ VVG NN NP CC NP ( ) , NP ( ) CC NP ( )	Fundamental	Basis	Neutral
07-1049_2	There are several algorithms for doing so ( ) , which cluster words into classes based on the most frequent neighbors of each word	EX VBP JJ NNS IN VVG IN ( ) , WDT NN NNS IN NNS VVN IN DT RBS JJ NNS IN DT NN	BackGround	GRelated	Neutral
07-1049_3	This restriction is inspired by psycholin-guistic research which suggests that humans process language incrementally ( )	DT NN VBZ VVN IN JJ NN WDT VVZ DT NNS NN NN RB ( )	Fundamental	Idea	Neutral
07-1049_4	When Klein and Manning induce the parts-of-speech , they do so from a much larger corpus containing the full WSJ treebank together with additional WSJ newswire ( )	WRB NP CC NP VV DT NN , PP VVP RB IN DT RB JJR NN VVG DT JJ NP NN RB IN JJ NP NN ( )	BackGround	SRelated	Neutral
07-1049_6	This can either be semi-supervised parsing , using both annotated and unannotated data ( ) or unsupervised parsing , training entirely on unan-notated text	DT MD RB VB VVN VVG , VVG DT VVN CC JJ NNS ( ) CC JJ VVG , VVG RB IN JJ NN	BackGround	GRelated	Neutral
07-1049_8	This problem is known in psycholinguistics as the problem of reanalysis ( )	DT NN VBZ VVN IN NNS IN DT NN IN NN ( )	BackGround	GRelated	Neutral
07-1050_0	For large datasets , we use an ensemble technique inspired by Bagging ( )	IN JJ NNS , PP VVP DT NN NN VVN IN NP ( )	Fundamental	Idea	Neutral
07-1050_1	In particular , we consider an algorithm proposed by Camerini et al. ( ) which has a worst-case complexity of O(km log(n)) , where k is the number of parses we want , n is the number of words in the input sentence , and m is the number of edges in the hypothesis graph	IN JJ , PP VVP DT NN VVN IN NP NP NP ( ) WDT VHZ DT JJ NN IN NP NP , WRB NN VBZ DT NN IN VVZ PP VVP , NN VBZ DT NN IN NNS IN DT NN NN , CC NN VBZ DT NN IN NNS IN DT NN NN	Fundamental	Basis	Neutral
07-1050_1	The k-best MST algorithm we introduce in this paper is the algorithm described in Camerini et al. ( )	DT JJ NP NN PP VV IN DT NN VBZ DT NN VVN IN NP NP NP ( )	Fundamental	Basis	Neutral
07-1050_1	Algorithm 1 is a version of the MST algorithm as presented by Camerini et al. ( ); subtleties of the algorithm have been omitted	NN CD VBZ DT NN IN DT NP NN IN VVN IN NP NP NP ( JJ NNS IN DT NN VHP VBN VVN	Fundamental	Idea	Neutral
07-1050_1	We have introduced the Camerini et al. ( ) k-best MST algorithm and have shown how to efficiently train MaxEnt models for dependency parsing	PP VHP VVN DT NP NP NP ( ) JJ NP NN CC VHP VVN WRB TO RB VV JJ NNS IN NN VVG	Fundamental	Basis	Positive
07-1050_2	Many of the model features have been inspired by the constituency-based features presented in Charniak and Johnson ( )	JJ IN DT NN NNS VHP VBN VVN IN DT JJ NNS VVN IN NP CC NP ( )	Fundamental	Idea	Neutral
07-1050_3	Other DP solutions use constituency-based parsers to produce phrase-structure trees , from which dependency structures are extracted ( )	JJ JJ NNS VVP JJ NNS TO VV NN NNS , IN WDT NN NNS VBP VVN ( )	BackGround	GRelated	Neutral
07-1050_4	An efficient algorithm for generating the k-best parse trees for a constituency-based parser was presented in Huang and Chiang ( ); a variation of that algorithm was used for generating projective dependency trees for parsing in Dreyer et al. ( ) and for training in McDonald et al. ( )	DT JJ NN IN VVG DT NN VVP NNS IN DT JJ NN VBD VVN IN NP CC NP ( NN DT NN IN DT NN VBD VVN IN VVG JJ NN NNS IN VVG IN NP NP NP ( ) CC IN NN IN NP NP NP ( )	BackGround	GRelated	Neutral
07-1050_5	The DP algorithms are generally variants of the CKY bottom-up chart parsing algorithm such as that proposed by Eisner ( )	DT JJ NNS VBP RB NNS IN DT NP JJ NN VVG NN JJ IN DT VVN IN NP ( )	BackGround	SRelated	Neutral
07-1050_8	2 In order to explore a rich set of syntactic features in the MST framework , we can either approximate the optimal non-projective solution as in McDonald and Pereira ( ) , or we can use the constrained MST model to select a subset of the set of dependency parses to which we then apply less-constrained models	CD IN NN TO VV DT JJ NN IN JJ NNS IN DT NP NN , PP MD RB JJ DT JJ JJ NN IN IN NP CC NP ( ) , CC PP MD VV DT VVN NP NN TO VV DT NN IN DT NN IN NN VVZ TO WDT PP RB VVP JJ NNS	BackGround	GRelated	Neutral
07-1050_8	Unlike the training procedure employed by McDonald et al. ( ) and McDonald and Pereira ( ) , we provide positive and negative examples in the training data	IN DT NN NN VVN IN NP NP NP ( ) CC NP CC NP ( ) , PP VVP JJ CC JJ NNS IN DT NN NNS	Compare	Compare	Neutral
07-1050_8	A second labeling stage can be applied to get labeled dependency structures as described in ( )	DT JJ VVG NN MD VB VVN TO VV VVN NN NNS RB VVN IN ( )	BackGround	SRelated	Neutral
07-1050_9	The Maximum Spanning Tree algorithm 1 was recently introduced as a viable solution for non-projective dependency parsing ( )	DT NP NP NP NN CD VBD RB VVN IN DT JJ NN IN JJ NN VVG ( )	BackGround	GRelated	Neutral
07-1050_9	McDonald et al. ( ) introduced a model for dependency parsing based on the Edmonds/Chu-Liu algorithm	NP NP NP ( ) VVD DT NN IN NN VVG VVN IN DT NP NN	BackGround	GRelated	Neutral
07-1050_9	Many of the features above were introduced in McDonald et al. ( ); specifically , the node-type , inside , and edge features	JJ IN DT NNS RB VBD VVN IN NP NP NP ( NN RB , DT NN , RB , CC NN NNS	BackGround	SRelated	Neutral
07-1050_12	We have adopted the conditional Maximum Entropy (MaxEnt) modeling paradigm as outlined in Char-niak and Johnson ( ) and Riezler et al. ( )	PP VHP VVN DT JJ NP NP JJ NN NN IN VVN IN NP CC NP ( ) CC NP NP NP ( )	Fundamental	Basis	Neutral
07-1050_13	Work on statistical dependency parsing has utilized either dynamic-programming (DP) algorithms or variants of the Edmonds/Chu-Liu MST algorithm (see Tarjan ( ))	NN IN JJ NN VVG VHZ VVN DT JJ NN NNS CC NNS IN DT NP NP NN NN NP ( NN	BackGround	SRelated	Neutral
07-1050_13	This can be reduced to O(kn 2) in dense graphs 4 by choosing appropriate data structures ( )	DT MD VB VVN TO NP JJ IN JJ NNS CD IN VVG JJ NN NNS ( )	BackGround	SRelated	Neutral