Nicholas Asher

Papers on this page may belong to the following people: Nicholas Asher, Nicholas Asher


2026

Vision-Language Models (VLMs) are able to process increasingly longer videos. Yet, important visual information is easily lost throughout the entire context and missed by VLMs. Also, it is important to design tools that enable cost-effective analysis of lengthy video content. In this paper, we propose a clip selection method that targets key video moments to be included in a multimodal summary. We divide the video into short clips and generate compact visual descriptions of each using a lightweight video captioning model. These are then passed to a large language model (LLM), which selects the K clips containing the most relevant visual information for a multimodal summary. We evaluate our approach on reference clips for the task, automatically derived from full human-annotated screenplays and summaries in the MovieSum dataset. We further show that these reference clips (less than 6% of the movie) are sufficient to build a complete multimodal summary of the movies in MovieSum. Using our clip selection method, we achieve a summarization performance close to that of these reference clips while capturing substantially more relevant video information than random clip selection. Importantly, we maintain low computational cost by relying on a lightweight captioning model.

2020

This paper describes a corpus of situated multiparty chats developed for the STAC project (Strategic Conversation, ERC grant n. 269427). and annotated for discourse structure in the style of Segmented Discourse Representation Theory (SDRT; Asher & Lascarides,2003). The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides discourse structures for multiparty dialogues situated within a virtual environment. The corpus was annotated in two stages: we initially annotated the chat moves only, but later decided to annotate interactions between the chat moves and non-linguistic events from the virtual environment. This two-step procedure has allowed us quantify various ways in which adding information from the nonlinguistic context affects dialogue structure. In this paper, we look at how annotations based only on linguistic information were preserved once the nonlinguistic context was factored in. We explain that while the preservation of relation instances is relatively high when we move from one corpus to the other, there is little preservation of higher order structures that capture "the main point" of a dialogue and distinguish it from peripheral information.

2016

This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied.