Shallow semantic parsing, the automatic identification and labeling of sentential constituents, has recently received much attention.
Our work examines whether semantic role information is beneficial to question answering.
We introduce a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm.
We view semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching.
Experimental results on the TREC datasets demonstrate improvements over state-of-the-art models.
1 Introduction
Recent years have witnessed significant progress in developing methods for the automatic identification and labeling of semantic roles conveyed by sentential constituents.1 The success of these methods, often referred to collectively as shallow semantic parsing (Gildea and Jurafsky, 2002), is largely due to the availability of resources like FrameNet (Fillmore et al., 2003) and PropBank (Palmer et al., 2005), which document the surface realization of semantic roles in real world corpora.
More concretely, in the FrameNet paradigm, the meaning of predicates (usually verbs, nouns, or adjectives) is conveyed by frames, schematic representations of situations.
Semantic roles (or frame
1The approaches are too numerous to list; we refer the interested reader to Carreras and Marquez (2005) for an overview.
elements) are defined for each frame and correspond to salient entities present in the evoked situation.
Predicates with similar semantics instantiate the same frame and are attested with the same roles.
The FrameNet database lists the surface syntactic realizations of semantic roles, and provides annotated example sentences from the British National Corpus.
For example, the frame CommerceSell has three core semantic roles, namely Buyer, Goods, and Seller — each expressed by an indirect object, a direct object, and a subject (see sentences (1a)-(1c)).
It can also be attested with non-core (peripheral) roles (e.g., Means, Manner, see (1d) and (1e)) that are more generic and can be instantiated in several frames, besides CommerceSell.
The verbs sell, vend, and retail can evoke this frame, but also the nouns sale and vendor.
b. [Kim]Seller sold [the sweater]Goods.
c. [My company]Seller has sold [more than three million copies]Goods.
cash]Means.
e. [He]Seller [reluctanctly]Manner sold
[his rock]Goods.
By abstracting over surface syntactic configurations, semantic roles offer an important first step towards deeper text understanding and hold promise for a range of applications requiring broad coverage semantic processing.
Question answering (QA) is often cited as an obvious beneficiary of semantic
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 12-21, Prague, June 2007.
©2007 Association for Computational Linguistics
role labeling (Gildea and Jurafsky, 2002; Palmer et al., 2005; Narayanan and Harabagiu, 2004).
Faced with the question Q: What year did the U.S. buy Alaska? and the retrieved sentence S:.. .before Russia sold Alaska to the United States in 1867, a hypothetical QA system must identify that United States is the Buyer despite the fact that it is attested in one instance as a subject and in another as an object.
Once this information is known, isolating the correct answer (i.e., 1867) can be relatively straightforward.
Although conventional wisdom has it that semantic role labeling ought to improve answer extraction, surprising little work has been done to this effect (see Section 2 for details) and initial results have been mostly inconclusive or negative (Sun et al., 2005; Kaisser, 2006).
There are at least two good reasons for these findings.
First, shallow semantic parsers trained on declarative sentences will typically have poor performance on questions and generally on out-of-domain data.
Second, existing resources do not have exhaustive coverage and recall will be compromised, especially if the question answering system is expected to retrieve answers from unrestricted text.
Since FrameNet is still under development, its coverage tends to be more of a problem in comparison to other semantic role resources such as PropBank.
In this paper we propose an answer extraction model which effectively incorporates FrameNet-style semantic role information.
We present an automatic method for semantic role assignment which is conceptually simple and does not require extensive feature engineering.
A key feature of our approach is the comparison of dependency relation paths attested in the FrameNet annotations and raw text.
We formalize the search for an optimal role assignment as an optimization problem in a bipartite graph.
This formalization allows us to find an exact, globally optimal solution.
The graph-theoretic framework goes some way towards addressing coverage problems related with FrameNet and allows us to formulate answer extraction as a graph matching problem.
As a byproduct of our main investigation we also examine the issue of FrameNet coverage and show how much it impacts performance in a TREC-style question answering setting.
In the following section we provide an overview of existing work on question answering systems that
exploit semantic role-based lexical resources.
Then we define our learning task and introduce our approach to semantic role assignment and answer extraction in the context of QA.
Next, we present our experimental framework and data.
We conclude the paper by presenting and discussing our results.
2 Related Work
Question answering systems have traditionally depended on a variety of lexical resources to bridge surface differences between questions and potential answers.
WordNet (Fellbaum, 1998) is perhaps the most popular resource and has been employed in a variety of QA-related tasks ranging from query expansion, to axiom-based reasoning (Moldovan et al., 2003), passage scoring (Paranjpe et al., 2003), and answer filtering (Leidner et al., 2004).
Besides WordNet, recent QA systems increasingly rely on syntactic information as a means of abstracting over word order differences and structural alternations (e.g., passive vs. active voice).
Most syntax-based QA systems (Wu et al., 2005) incorporate some means of comparison between the tree representing the question with the subtree surrounding the answer candidate.
The assumption here is that appropriate answers are more likely to have syntactic relations in common with their corresponding question.
Syntactic structure matching has been applied to passage retrieval (Cui et al., 2005) and answer extraction (Shen and Klakow, 2006).
Narayanan and Harabagiu (2004) were the first to stress the importance of semantic roles in answering complex questions.
Their system identifies predicate argument structures by merging semantic role information from PropBank and FrameNet.
Expected answers are extracted by performing probabilistic inference over the predicate argument structures in conjunction with a domain specific topic model.
Sun et al. (2005) incorporate semantic analysis in their TREC05 QA system.
They use ASSERT (Pradhan et al., 2004), a publicly available shallow semantic parser trained on PropBank, to generate predicate-argument structures which subsequently form the basis of comparison between question and answer sentences.
They find that semantic analysis does not boost performance due to the low recall of the semantic parser.
Kaisser (2006) proposes a
Sent.
q .
Model II
Figure 1: Architecture of answer extraction
question paraphrasing method based on FrameNet.
Questions are assigned semantic roles by matching their dependency relations with those attested in the FrameNet annotations.
The assignments are used to create question reformulations which are submitted to Google for answer extraction.
The semantic role assignment module is not probabilistic, it relies on strict matching, and runs into severe coverage problems.
In line with previous work, our method exploits syntactic information in the form of dependency relation paths together with FrameNet-like semantic roles to smooth lexical and syntactic divergences between question and answer sentences.
Our approach is less domain dependent and resource intensive than Narayanan and Harabagiu (2004), it solely employs a dependency parser and the FrameNet database.
In contrast to Kaisser (2006), we model the semantic role assignment and answer extraction tasks numerically, thereby alleviating the coverage problems encountered previously.
3 Problem Formulation
We briefly summarize the architecture of the QA system we are working with before formalizing the mechanics of our FrameNet-based answer extraction module.
In common with previous work, our overall approach consists of three stages: (a) determining the expected answer type of the question, (b) retrieving passages likely to contain answers to the question, and (c) performing a match between the question words and retrieved passages in order to extract the answer.
In this paper we focus on the last stage: question and answer sentences are normalized to a FrameNet-style representation and answers are retrieved by selecting the candidate whose semantic structure is most similar to the question.
The architecture of our answer extraction mod-
ule is shown in Figure 1.
Semantic structures for questions and sentences are automatically derived using the model described in Section 4 (Model I).
A semantic structure SemStruc = (p,Set(SRA)) consists of a predicate p and a set of semantic role assignments Set (SRA). p is a word or phrase evoking a frame F of FrameNet.
A semantic role assignment SRA is a ternary structure (w, SR, s), consisting of frame element w, its semantic role SR, and score s indicating to what degree SR qualifies as a label for w.
For a question q, we generate a semantic structure SemStrucq.
Question words, such as what, who, when, etc., are considered expected answer phrases (EAPs).
We require that EAPs are frame elements of SemStrucq.
Likely answer candidates are extracted from answer sentences following some preprocessing steps detailed in Section 6.
For each candidate ac, we derive its semantic structure SemStrucac and assume that ac is a frame element of SemStrucac.
Question and answer semantic structures are compared using a model based on graph matching detailed in Section 5 (Model II).
We calculate the similarity of all derived pairs ( SemStrucq, SemStrucac) and select the candidate with the highest value as an answer for the question.
Semantic Structure Generation
Our method crucially exploits the annotated sentences in the FrameNet database together with the output of a dependency parser.
Our guiding assumption is that sentences that share dependency relations will also share semantic roles as long as they evoke the same or related frames.
This is motivated by much research in lexical semantics (e.g., Levin (1993)) hypothesizing that the behavior of words, particularly with respect to the expression and interpretation of their arguments, is to a large extent determined by their meaning.
We first describe how predicates are identified and then introduce our model for semantic role labeling.
Predicate Identification Predicate candidates are identified using a simple look-up procedure which compares POS-tagged tokens against FrameNet entries.
For efficiency reasons, we make the simplifying assumption that questions have only one predicate which we select heuristically: (1) verbs are pre-
ferred to other parts of speech, (2) if there is more than one verb in the question, preference is given to the verb with the highest level of embedding in the dependency tree, (3) if no verbs are present, a noun is chosen.
For example, in Q: Who beat Floyd Patterson to take the title away?, beat, take away, and title are identified as predicate candidates and beat is selected the main predicate of the question.
For answer sentences, we require that the predicate is either identical or semantically related to the question predicate (see Section 5).
In the example given above, the predicate beat evoques a single frame (i.e., Cause harm).
However, predicates often have multiple meanings thus evo-quing more than one frame.
Knowing which is the appropriate frame for a given predicate impacts the semantic role assignment task; selecting the wrong frame will unavoidably result in erroneous semantic roles.
Rather than disambiguiting polysemous predicates prior to semantic role assignment, we perform the assignment for each frame evoqued by the predicate.
Semantic Role Assignment Before describing our approach to semantic role labeling we define dependency relation paths.
A relation path R is a relation sequence (r1; r2,..., rL), in which rl (l = 1,2,...,L) is one of predefined dependency relations with suffix of traverse direction.
An example of a relation path is R = (subju, objD), where the subscripts U and D indicate upward and downward movement in trees, respectively.
Given an unanno-tated sentence whose roles we wish to label, we assume that words or phrases w with a dependency path connecting them to p are frame elements.
Each frame element is represented by an unlabeled dependency path Rw which we extract by traversing the dependency tree from w to p. Analogously, we extract from the FrameNet annotations all dependency paths RSR that are labeled with semantic role information and correspond to p. We next measure the compatibility of labeled and unlabeled paths as follows:
where M is the set of dependency relation paths for SR in FrameNet, sim (Rw,RSR) the similarity between paths Rw and RSR weighted by the relative
Figure 2: Sample original bipartite graph (a) and its subgraph with edge covers (b).
In each graph, the left partition represents frame elements and the right partition semantic roles.
frequency of RSR in FrameNet (P(RSR)).
We consider both core and non-core semantic roles instantiated by frames with at least one annotation in FrameNet.
Core roles tend to have more annotations in Framenet and consequently are considered more probable.
We measure sim (Rw, RSR), by adapting a string kernel to our task.
Our hypothesis is that the more common substrings two dependency paths have, the more similar they are.
The string kernel we used is similar to Leslie (2002) and defined as the sum of weighted common dependency relation subsequences between Rw and RSR.
For efficiency, we consider only unigram and bigram subsequences.
Subsequences are weighted by a metric akin to tf • idf which measures the degree of association between a candidate SR and the dependency relation r present in the subsequence:
where fr is the frequency of r occurring in SR; N is the total number of SRs evoked by a given frame; and nr is the number of SRs containing r.
For each frame element we thus generate a set of semantic role assignments Set(SRA).
This initial assignment can be usefully represented as a complete bipartite graph in which each frame element (word or phrase) is connected to the semantic roles licensed by the predicate and vice versa.
(see Figure 2a).
Edges are weighted and represent how compatible the frame elements and semantic roles are (see equation (2)).
Now, for each frame element w
p: discover
Original SR assignments:
ooe^^Cognizer
Optimized SR assignments:
Phenomenon
/Evidence
Figure 3: Semantic structures induced by our model for a question and answer sentence
we could simply select the semantic role with the highest score.
However, this decision procedure is local, i.e., it yields a semantic role assignment for each frame element independently of all other elements.
We therefore may end up with the same role being assigned to two frame elements or with frame elements having no role at all.
We remedy this shortcoming by treating the semantic role assignment as a global optimization problem.
Specifically, we model the interaction between all pairwise labeling decisions as a minimum weight bipartite edge cover problem (Eiter and Mannila, 1997; Cormen et al., 1990).
An edge cover is a subgraph of a bipartite graph so that each node is linked to at least one node of the other partition.
This yields a semantic role assignment for all frame elements (see Figure 2b where frame elements and roles are adjacent to an edge).
Edge covers have been successfully applied in several natural language processing tasks, including machine translation (Taskar et al.,
2006) .
Formally, optimal edge cover assignments are solutions of following optimization problem:
E is edge cover
tween the frame element node ndw and semantic role node ndSR.
Edge covers can be computed efficiently in cubic time using algorithms for the equivalent linear assignment problem.
Our experiments used Jonker and Volgenant's (1987) solver.2
Figure 3 shows the semantic role assignments generated by our model for the question Q: Who discovered prions? and the candidate answer sentence S: 1997: Stanley B. Prusiner, United States, discovery of prions...
Here we identify two predicates, namely discover and discovery.
The expected answer phrase (EAP) who and the answer candidate StanleyB.
Prusiner are assigned the Cognizer role.
Note that frame elements can bear multiple semantic roles.
By inducing a soft labeling we hope to render the matching of questions and answers more robust, thereby addressing to some extent the coverage problems associated with FrameNet.
5 Semantic Structure Matching
We measure the similarity between a question and its candidate answer by matching their predicates and semantic role assignments.
Since SRs are frame-specific, we prioritize frame matching to SR matching.
Two predicates match if they evoke the same frame or one of its hypernyms (or hyponyms).
The latter are expressed by the Inherits From and Is Inherited By relations in the frame definitions.
If the predicates match, we examine whether the assigned semantic roles match.
Since we represent SR assignments as graphs with edge covers, we can also formalize SR matching as a graph matching problem.
The similarity between two graphs is measured as the sum of similarities between their subgraphs.
We first decompose a graph into subgraphs consisting of one frame element node w and a set of SR nodes connected to it.
The similarity between two subgraphs SubG1, and SubG2 is then formalized as:
where, nd\R and nd2[R are semantic role nodes connected to a frame element node ndw in SubG\ and
where, s(ndw, ndSR) is the compatibility score be-
2The software is available from http://www.magiclogic. com/assignment.html .
SemStruc
Figure 4: Distribution of Numbers of Predicates and annotated sentences; each sub-pie, lists the number of predicates (above) with their corresponding range of annotated sentences (below)
SubG2, respectively. s(ndw,ndf) and s(ndw,nS^) are edge weights between two nodes in corresponding subgraphs (see (2)).
Our intuition here is that the more semantic roles two subgraphs share for a given frame element, the more similar they are and the closer their corresponding edge weights should be.
Edge weights are normalized by dividing by the sum of all edges in a subgraph.
6 Experimental Setup
Data All our experiments were performed on the TREC02-05 factoid questions.
We excluded NIL questions since TREC doesn't supply an answer for them.
We used the FrameNet V1.3 lexical database.
It contains 10,195 predicates grouped into 795 semantic frames and 141,238 annotated sentences.
Figure 4 shows the number of annotated sentences available for different predicates.
As can be seen, there are 3,380 predicates with no annotated sentences and 1,175 predicates with less than 5 annotated sentences.
All FrameNet sentences, questions, and answer sentences were parsed using MiniPar (Lin, 1994), a robust dependency parser.
As mentioned in Section 4 we extract dependency relation paths by traversing the dependency tree from the frame element node to the predicate node.
We used all dependency relations provided by MiniPar (42 in total).
In order to increase coverage, we combine all relation paths for predicates
that evoke the same frame and are labeled with the same POS tag.
For example, found and establish are both instances of the frame Intentionally-create but the database does not have any annotated sentences for found.v.
In default of not assigning any role labels for found.v, our model employs the relation paths for the semantically related establish.v.
Preprocessing Here we summarize the steps of our QA system preceding the assignment of semantic structure and answer extraction.
For each question, we recognize its expected answer type (e.g., in Q: Which record company is Fred Durst with? we would expect the answer to be an ORGANIZATION).
Answer types are determined using classification rules similar to Li and Roth (2002).
We also reformulate questions into declarative sentences following the strategy proposed in Brill et al. (2002).
The reformulated sentences are submitted as queries to an IR engine for retrieving sentences with relevant answers.
Specifically, we use the Lemur Toolkit3, a state-of-the-art language model-driven search engine.
We work only with the 50 top-ranked sentences as this setting performed best in previous experiments of our QA system.
We also add to Lemur's output gold standard sentences, which contain and support an answer for each question.
Specifically, documents relevant for each question are retrieved from the AQUAINT Corpus4 according to TREC supplied judgments.
Next, sentences which match both the TREC provided answer pattern and at least one question key word are extracted and their suitability is manually judged by humans.
The set of relevant sentences thus includes at least one sentence with an appropriate answer as well as sentences that do not contain any answer specific information.
This setup is somewhat idealized, however it allows us to evaluate in more detail our answer extraction module (since when an answer is not found, we know it is the fault of our system).
Relevant sentences are annotated with their named entities using Lingpipe5, a MUC-based named entity recognizer.
When we successfully classify a question with an expected answer type
3See http://www.lemurproject.org/ for details.
4This corpus consists of English newswire texts and is used as the main document collection in official TREC evaluations.
5The software is available from www.alias-i.com/ lingpipe/
(e.g., ORGANIZATION in the example above), we assume that all NPs attested in the set of relevant sentences with the same answer type are candidate answers; in cases where no answer type is found (e.g., as in Q: What are prions made of?), all NPs in the relevant answers set are considered candidate answers.
Baseline We compared our answer extraction method to a QA system that exploits solely syntactic information without making use of FrameNet or any other type of role semantic annotations.
For each question, the baseline identifies key phrases deemed important for answer identification.
These are verbs, noun phrases, and expected answer phrases (EAPs, see Section 3).
All dependency relation paths connecting a key phrase and an EAP are compared to those connecting the same key phrases and an answer candidate.
The similarity of question and answer paths is computed using a simplified version of the similarity measure6 proposed in Shen and Klakow (2006).
Our second baseline employs Shalmaneser (Erk and Pado, 2006), a publicly available shallow semantic parser7, for the role labeling task instead of the graph-based model presented in Section 4.
The software is trained on the FrameNet annotated sentences using a standard feature set (see Carreras and Marquez (2005) for details).
We use Shalmaneser to parse questions and answer sentences.
The parser makes hard decisions about the presence or absence of a semantic role.
Unfortunately, this prevents us from using our method for semantic structure matching (see Section 5) which assumes a soft labeling.
We therefore came up with a simple matching strategy suitable for the parser's output.
For question and answer sentences matching in their frame assignment, phrases bearing the same semantic role as the EAP are considered answer candidates.
The latter are ranked according to word overlap (i.e., identical phrases are ranked higher than phrases with no
6Shen and Klakow (2006) use a dynamic time warping algorithm to calculate the degree to which dependency relation paths are correlated.
Correlations for individual relations are estimated from training data whereas we assume a binary value (1 for identical relations and 0 otherwise).
The modification was necessary to render the baseline system comparable to our answer extraction model which is unsupervised.
7The software is available from http://www.coli. uni-saarland.de/projects/salsa/shal/ .
Our evaluation was motivated by the following questions: (1) How does the incompleteness of FrameNet impact QA performance on the TREC data sets?
In particular, we wanted to examine whether there are questions for which in principle no answer can be found due to missing frame entries or missing annotated sentences.
(2) Are all questions and their corresponding answers amenable to a FrameNet-style analysis?
In other words, we wanted to assess whether questions and answers often evoke the same or related frames (with similar roles).
This is a prerequisite for semantic structure matching and ultimately answer extraction.
(3) Do the graph-based models introduced in this paper bring any performance gains over state-of-the-art shallow semantic parsers or more conventional syntax-based QA systems?
Recall that our graph-based models were designed especially for the QA answer extraction task.
Our results are summarized in Tables 1-3.
Table 1 records the number of questions to be answered for the TREC02-05 datasets (Total).
We also give information regarding the number of questions which are in principle unanswerable with a FrameNet-style semantic role analysis.
Column NoFrame shows the number of questions which don't have an appropriate frame or predicate in the database.
For example, there is currently no predicate entry for sponsor or sink (e.g., Q: Who is the sponsor of the International Criminal Court? and Q: What date did the Lusitania sink?).
Column NoAnnot refers to questions for which no semantic role labeling is possible because annotated sentences for the relevant predicates are missing.
For instance, there are no annotations for win (e.g., Q: What division did Floyd Patterson win? ) or for hit (e.g., Q: What was the Beatles' first number one hit?).
This problem is not specific to our method which admittedly relies on FrameNet annotations for performing the semantic role assignment (see Section 4).
Shallow semantic parsers trained on FrameNet would also have trouble assigning roles to predicates for which no data is available.
Finally, column NoMatch reports the number of questions which cannot be answered due to frame
Table 1: Number of questions which cannot be answered using a FrameNet style semantic analysis; numbers in parentheses are percentages of Total (NoFrame: frames or predicates are missing; NoAnnot: annotated sentences are missing, NoMatch: questions and candidate answers evoke different frames.
mismatches.
Consider Q: What does AARP stand for? whose answer is found in S: The American Association of Retired Persons (AARP) qualify for discounts....
The answer and the question evoke different frames; in fact here a semantic role analysis is not relevant for locating the right answer.
As can be seen NoMatch cases are by far the most frequent.
The number of questions remaining after excluding NoFrame, NoAnnot, and NoMatch are shown under the Rest heading in Table 1.
These results indicate that FrameNet-based semantic role analysis applies to approximately 35% of the TREC data.
This means that an extraction module relying solely on FrameNet will have poor performance, since it will be unable to find answers for more than half of the questions beeing asked.
We nevertheless examine whether our model brings any performance improvements on this limited dataset which is admittedly favorable towards a FrameNet style analysis.
Table 2 shows the results of our answer extraction module (SemMatch) together with two baseline systems.
The first baseline uses only dependency relation path information (SynMatch), whereas the second baseline (SemParse) uses Shal-maneser, a state-of-the-art shallow semantic parser for the role labeling task.
We consider an answer correct if it is returned with rank 1.
As can be seen, SemMatch is significantly better than both Syn-Match and SemParse, whereas the latter is significantly worse than SynMatch.
Although promising, the results in Table 2 are not very informative, since they show performance gains on partial data.
Instead of using our answer extraction model on its own, we next combined it with the syntax-based system mentioned above (SynMatch, see also Section 6 for details).
If FrameNet is indeed helpful for QA, we would expect an ensemble sys-
Table 2: System Performance on subset of TREC datasets (see Rest column in Table 1); *: significantly better than SemParse; ^: significantly better than SynMatch (p < 0.01, using a X2 test).
SynMatch
+SemParse
+SemMatch
Table 3: System Performance on TREC datasets (see Total column in Table 1); *: significantly better than +SemParse; 1: significantly better than SynMatch (p < 0.01, using a X2 test).
tem to yield better performance over a purely syntactic answer extraction module.
The two systems were combined as follows.
Given a question, we first pass it to our FrameNet model; if an answer is found, our job is done; if no answer is returned, the question is passed on to SynMatch.
Our results are given in Table 3.
+SemMatch and +SemParse are ensemble systems using SynMatch together with the QA specific role labeling method proposed in this paper and Shalmaneser, respectively.
We also compare these systems against SynMatch on its own.
We can now attempt to answer our third question concerning our model's performance on the TREC data.
Our experiments show that a FrameNet-enhanced answer extraction module significantly outperforms a similar module that uses only syntactic information (compare SynMatch and +Sem-Match in Table 3).
Another interesting finding is that
the shallow semantic parser performs considerably worse in comparison to our graph-based models and the syntax-based system.
Inspection of the parser's output highlights two explanations for this.
First, the shallow semantic parser has difficulty assigning accurate semantic roles to questions (even when they are reformulated as declarative sentences).
And secondly, it tends to favor precision over recall, thus reducing the number of questions for which answers can be found.
A similar finding is reported in Sun et al. (2005) for a PropBank trained parser.
8 Conclusion
In this paper we assess the contribution of semantic role labeling to open-domain factoid question answering.
We present a graph-based answer extraction model which effectively incorporates FrameNet style role semantic information and show that it achieves promising results.
Our experiments show that the proposed model can be effectively combined with a syntax-based system to obtain performance superior to the latter when used on its own.
Furthermore, we demonstrate performance gains over a shallow semantic parser trained on the FrameNet annotated corpus.
We argue that performance gains are due to the adopted graph-theoretic framework which is robust to coverage and recall problems.
We also provide a detailed analysis of the appropriateness of FrameNet for QA.
We show that performance can be compromised due to incomplete coverage (i.e., missing frame or predicate entries as well as annotated sentences) but also because of mismatching question-answer representations.
The question and the answer may evoke different frames or the answer simply falls outside the scope of a given frame (i.e., in a non predicate-argument structure).
Our study shows that mismatches are relatively frequent and motivates the use ofsemantically informed methods in conjunction with syntax-based methods.
Important future directions lie in evaluating the contribution of alternative semantic role frameworks (e.g., PropBank) to the answer extraction task and developing models that learn semantic roles directly from unannotated text without the support of FrameNet annotations (Grenager and Manning, 2006).
Beyond question answering, we also plan to
investigate the potential of our model for shallow semantic parsing since our experience so far has shown that it achieves good recall.
Acknowledgements We are grateful to Sebastian Pado for running Shalmaneser on our data.
Thanks to Frank Keller and Amit Dubey for insightful comments and suggestions.
The authors acknowledge the support of DFG (Shen; PhD studentship within the International Postgraduate College "Language Technology and Cognitive Systems") and EPSRC (Lap-ata; grant EP/C538447/1).
