Linguistic Issues in Language Technology (2015)
This paper focuses on the question of what kind of data needs to be recorded about figurative language, in order to capture the essential meaning of the text and to enable us to re-create a synonymous text, based on that data. A short review of the best known systems of semantic annotation will be presented and their suitability for the task will be analyzed. Also, a method that could be used for representing the meaning of the idioms, metaphors and metonymy in the data model will be considered.
Type theory has played an important role in specifying the formal connection between syntactic structure and semantic interpretation within the history of formal semantics. In recent years rich type theories developed for the semantics of programming languages have become influential in the semantics of natural language. The use of probabilistic reasoning to model human learning and cognition has become an increasingly important part of cognitive science. In this paper we offer a probabilistic formulation of a rich type theory, Type Theory with Records (TTR), and we illustrate how this framework can be used to approach the problem of semantic learning. Our probabilistic version of TTR is intended to provide an interface between the cognitive process of classifying situations according to the types that they instantiate, and the compositional semantics of natural language.
Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Lifts up Computational Linguistics
T. S. Eliot’s poem The Waste Land is a notoriously challenging example of modernist poetry, mixing the independent viewpoints of over ten distinct characters without any clear demarcation of which voice is speaking when. In this work, we apply unsupervised techniques in computational stylistics to distinguish the particular styles of these voices, offering a computer’s perspective on longstanding debates in literary analysis. Our work includes a model for stylistic segmentation that looks for points of maximum stylistic variation, a k-means clustering model for detecting non-contiguous speech from the same voice, and a stylistic profiling approach which makes use of lexical resources built from a much larger collection of literary texts. Evaluating using an expert interpretation, we show clear progress in distinguishing the voices of The Waste Land as compared to appropriate baselines, and we also offer quantitative evidence both for and against that particular interpretation.
How do standards of poetic beauty change as a function of time and expertise? Here we use computational methods to compare the stylistic features of 359 English poems written by 19th century professional poets, Imagist poets, contemporary professional poets, and contemporary amateur poets. Building upon techniques designed to analyze style and sentiment in texts, we examine elements of poetic craft such as imagery, sound devices, emotive language, and diction. We find that contemporary professional poets use significantly more concrete words than 19th century poets, fewer emotional words, and more complex sound devices. These changes are consistent with the tenets of Imagism, an early 20thcentury literary movement. Further analyses show that contemporary amateur poems resemble 19th century professional poems more than contemporary professional poems on several dimensions. The stylistic similarities between contemporary amateur poems and 19th century professional poems suggest that elite standards of poetic beauty in the past “trickled down” to influence amateur works in the present. Our results highlight the influence of Imagism on the modern aesthetic and reveal the dynamics between “high” and “low” art. We suggest that computational linguistics may shed light on the forces and trends that shape poetic style.
Within the field of literary analysis, there are few branches as confusing as that of genre theory. Literary criticism has failed so far to reach a consensus on what makes a genre a genre. In this paper, we examine the degree to which the character structure of a novel is indicative of the genre it belongs to. With the premise that novels are societies in miniature, we build static and dynamic social networks of characters as a strategy to represent the narrative structure of novels in a quantifiable manner. For each of the novels, we compute a vector of literary-motivated features extracted from their network representation. We perform clustering on the vectors and analyze the resulting clusters in terms of genre and authorship.
Since the 18th century, the novel has been one of the defining forms of English writing, a mainstay of popular entertainment and academic criticism. Despite its importance, however, there are few computational studies of the large-scale structure of novels—and many popular representations for discourse modeling do not work very well for novelistic texts. This paper describes a high-level representation of plot structure which tracks the frequency of mentions of different characters, topics and emotional words over time. The representation can distinguish with high accuracy between real novels and artificially permuted surrogates; characters are important for eliminating random permutations, while topics are effective at distinguishing beginnings from ends.
Literary works are becoming increasingly available in electronic formats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle R or Kobo Touch R , fosters the development of a new generation of reading tools for bilingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies. An important issue in the automatic processing of multilingual e-books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation professionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality. alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.