Ekata Mitra


2026

Summarizing deeply nested discussion threads requires handling interleaved replies, quotes, and overlapping topics, which standard LLM summarizers struggle to capture reliably. We introduce ThreadSumm, a multi-stage LLM framework that treats thread summarization as a hierarchical reasoning problem over explicit aspect and content unit representations. Our method first performs content planning via LLM-based extraction of discourse aspects and Atomic Content Units, then applies sentence ordering to construct thread-aware sequences that surface multiple viewpoints rather than a single linear strand. On top of these interpretable units, ThreadSumm employs a Tree of Thoughts search that generates and scores multiple paragraph candidates, jointly optimizing coherence and coverage within a unified search space. With this multi-proposal and iterative refinement design, we show improved performance in generating logically structured summaries compared to existing baselines, while achieving higher aspect retention and opinion coverage in nested discussions.

2023

As dialogue systems become more popular, evaluation of their response quality gains importance. Engagingness highly correlates with overall quality and creates a sense of connection that gives human participants a more fulfilling experience. Although qualities like coherence and fluency are readily measured with well-worn automatic metrics, evaluating engagingness often relies on human assessment, which is a costly and time-consuming process. Existing automatic engagingness metrics evaluate the response without the conversation history, are designed for one dataset, or have limited correlation with human annotations. Furthermore, they have been tested exclusively on English conversations. Given that dialogue systems are increasingly available in languages beyond English, multilingual evaluation capabilities are essential. We propose that large language models (LLMs) may be used for evaluation of engagingness in dialogue through prompting, and ask how prompt constructs and translated prompts compare in a multilingual setting. We provide a prompt-design taxonomy for engagingness and find that using selected prompt elements with LLMs, including our comprehensive definition of engagingness, outperforms state-of-the-art methods on evaluation of engagingness in dialogue across multiple languages.