Abstract
Modal verbs, such as can, may, and must, are commonly used in daily communication to convey the speaker’s perspective related to the likelihood and/or mode of the proposition. They can differ greatly in meaning depending on how they’re used and the context of a sentence (e.g. “They must help each other out.” vs. “They must have helped each other out.”). Despite their practical importance in natural language understanding, linguists have yet to agree on a single, prominent framework for the categorization of modal verb senses. This lack of agreement stems from high degrees of flexibility and polysemy from the modal verbs, making it more difficult for researchers to incorporate insights from this family of words into their work. As a tool to help navigate this issue, this work presents MoVerb, a dataset consisting of 27,240 annotations of modal verb senses over 4,540 utterances containing one or more sentences from social conversations. Each utterance is annotated by three annotators using two different theoretical frameworks (i.e., Quirk and Palmer) of modal verb senses. We observe that both frameworks have similar inter-annotator agreements, despite having a different number of sense labels (eight for Quirk and three for Palmer). With RoBERTa-based classifiers fine-tuned on MoVerb, we achieve F1 scores of 82.2 and 78.3 on Quirk and Palmer, respectively, showing that modal verb sense disambiguation is not a trivial task.