We present a two-stage multilingual dependency parsing system submitted to the Multilingual Track of CoNLL-2007.
The parser first identifies dependencies using a deterministic parsing method and then labels those dependencies as a sequence labeling problem.
We describe the features used in each stage.
For four languages with different values of ROOT, we design some special features for the ROOT labeler.
Then we present evaluation results and error analyses focusing on Chinese.
1 Introduction
The CoNLL-2007 shared tasks include two tracks: the Multilingual Track and Domain Adaptation Track(Nivre et al., 2007).
We took part the Multilingual Track of all ten languages provided by the CoNLL-2007 shared task organizers(Hajic et al., 2004; Aduriz et al., 2003; Marti et al., 2007; Chen et al., 2003; Bohmova et al., 2003; Marcus et al., 1993; Johansson and Nugues, 2007; Prokopidis et al., 2005; Csendes et al., 2005; Montemagni et al., 2003; Oflazer et al., 2003).
In this paper, we describe a two-stage parsing system consisting of an unlabeled parser and a sequence labeler, which was submitted to the Multilingual Track.
At the first stage, we use the parsing model proposed by (Nivre, 2003) to assign the arcs between the words.
Then we obtain a dependency parsing tree based on the arcs.
At the second stage, we use a SVM-based approach(Kudo and
Matsumoto, 2001) to tag the dependency label for each arc.
The labeling is treated as a sequence labeling problem.
We design some special features for tagging the labels of ROOT for Arabic, Basque, Czech, and Greek, which have different labels for ROOT.
The experimental results show that our approach can provide higher scores than average.
2 Two-Stage Parsing
The unlabeled parser predicts unlabeled directed dependencies.
This parser is primarily based on the parsing models described by (Nivre, 2003).
The algorithm makes a dependency parsing tree in one left-to-right pass over the input, and uses a stack to store the processed tokens.
The behaviors of the parser are defined by four elementary actions (where TOP is the token on top of the stack and NEXT is the next token in the original input string):
• Left-Arc(LA): Add an arc from NEXT to TOP; pop the stack.
NEXT; push NEXT onto the stack.
• Reduce(RE): Pop the stack.
• Shift(SH): Push NEXT onto the stack.
Although (Nivre et al., 2006) used the pseudo-projective approach to process non-projective dependencies, here we only derive projective dependency tree.
We use MaltParser(Nivre et al., 2006)
V0.41 to implement the unlabeled parser, and use the SVM model as the classifier.
More specifically, the MaltParser use LIBSVM(Chang and Lin, 2001) with a quadratic kernel and the built-in one-versus-all strategy for multi-class classification.
The MaltParser is a history-based parsing model, which relies on features of the derivation history to predict the next parser action.
We represent the features extracted from the fields of the data representation, including FORM, LEMMA, CPOSTAG, POSTAG, and FEATS.
We use the features for all languages that are listed as follows:
• The FORM features: the FORM of TOP and NEXT, the FORM of the token immediately before NEXT in original input string, and the
FORM of the head of TOP.
• The LEMMA features: the LEMMA of TOP and NEXT, the LEMMA of the token immediately before NEXT in original input string, and the LEMMA of the head of TOP.
• The CPOS features: the CPOSTAG of TOP and NEXT, and the CPOSTAG of next left token of the head of TOP.
• The POS features: the POSTAG of TOP and NEXT, the POSTAG of next three tokens after NEXT, the POSTAG of the token immediately before NEXT in original input string, the POSTAG of the token immediately below TOP, and the POSTAG of the token immediately after rightmost dependent of TOP.
• The FEATS features: the FEATS of TOP and NEXT.
But note that the fields LEMMA and FEATS are not available for all languages.
We denote by x = x1,...,xn a sentence with n words and by y a corresponding dependency tree.
A dependency tree is represented from ROOT to leaves
1 The tool is available at http://w3.msi.vxu.se/~nivre/research/MaltParser.html
with a set of ordered pairs ) e y in which Xj is a dependent and Xi is the head.
We have produced the dependency tree y at the first stage.
In this stage, we assign a label l^j) to each pair.
And we consider a first-order Markov chain of labels.
We used the package YamCha (V0.33)2 to implement the SVM model for labeling.
YamCha is a powerful tool for sequence labeling(Kudo and Mat-sumoto, 2001).
After the first stage, we know the unlabeled dependency parsing tree for the input sentence.
This information forms the basis for part of the features of the second stage.
For the sequence labeler, we define the individual features, the pair features, the verb features, the neighbor features, and the position features.
All the features are listed as follows:
• The individual features: the FORM, the
LEMMA, the CPOSTAG, the POSTAG, and
the FEATS of the parent and child node.
• The pair features: the direction of dependency, the combination of lemmata of the parent and child node, the combination of parent's LEMMA and child's CPOSTAG, the combination of parent's CPOSTAG and child's
LEMMA, and the combination of FEATS of
parent and child.
• The verb features: whether the parent or child is the first or last verb in the sentence.
2YamCha is available at http://chasen.org/~taku/software/yamcha/
• The position features: whether the child is the first or last word in the sentence and whether the child is the first word of left or right of parent.
Because there are four languages have different labels for root, we define the features for the root labeler.
The features are listed as follows:
• The individual features: the FORM, the LEMMA, the CPOSTAG, the POSTAG, and the FEATS of the parent and child node.
• The verb features: whether the child is the irst or last verb in the sentence.
• The neighbor features: the combination of
CPOSTAG and LEMMA of the left and right
neighbors of the parent and child, number of children, CPOSTAG sequence of children.
• The position features: whether the child is the irst or last word in the sentence and whether the child is the irst word of left or right of parent.
3 Evaluation Results
We evaluated our system in the Multilingual Track for all languages.
For the unlabeled parser, we chose the parameters for the MaltParser based on performance from a held-out section of the training data.
We also chose the parameters for Yamcha based on performance from training data.
Our official results are shown at Table 1.
Performance is measured by labeled accuracy and unla-beled accuracy.
These results showed that our two-stage system can achieve good performance.
For all languages, our system provided better results than average performance of all the systems(Nivre et al., 2007).
Compared with top 3 scores, our system provided slightly worse performance.
The reasons may be that we just used projective parsing algorithms while all languages except Chinese have non-projective structure.
Another reason was that we did not tune good parameters for the system due to lack of time.
Data Set
Hungarian
Table 1: The results of proposed approach.
LABELED ATTACHMENT SCORE(LA) and UNLA-BELED ATTACHMENT SCORE(UA)
For Chinese, the system achieved 81.24% on labeled accuracy and 85.91% on unlabeled accuracy.
We also ran the MaltParser to provide the labels.
Besides the same features, we added the DEPREL features: the dependency type of TOP, the dependency type of the token leftmost of TOP, the dependency type of the token rightmost of TOP, and the dependency type of the token leftmost of NEXT.
The labeled accuracy of MaltParser was 80.84%, 0.4% lower than our system.
Some conjunctions, prepositions, and DE3 attached to their head words with much lower accuracy: 74% for DE, 76% for conjunctions, and 71% for prepositions.
In the test data, these words formed 19.7%.
For Chinese parsing, coordination and preposition phrase attachment were hard problems.
(Chen et al., 2006) deined the special features for coordinations for chunking.
In the future, we plan to deine some special features for these words.
including "M/ff/itk/i.".
Table 2: The words where most of errors occur in Chinese data.
museum)" was to be tagged as "predication" instead of "property".
It was very hard to tell the labels between the words around "($".
Humans can make the distinction between property and predication for "($", because we have background knowledge of the words.
So if we can incorporate the additional knowledge for the system, the system may assign the correct label.
For '\ /C", it was hard to assign the head, 36 wrong head of all 38 errors.
It often appeared at coordination expressions.
For example, the head of \ "at"ft/gt/fg/7A /^/^/T/^^/(Besides extreme cool and too amazing)" was and the head of '\ "at "J|«/#tI/#/^J?
A /W/iR l^/fftl/^nffKGive the visitors solid and methodical knowledge)" was "Mi".
5 Conclusion
In this paper, we presented our two-stage dependency parsing system submitted to the Multilingual Track of CoNLL-2007 shared task.
We used Nivre's method to produce the dependency arcs and the sequence labeler to produce the dependency labels.
The experimental results showed that our system can provide good performance for all languages.
