This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
NingWang
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Large Multimodal Models (LMMs) have recently demonstrated impressive performance on general video comprehension benchmarks. Nevertheless, for broader applications, the robustness of their temporal analysis capability needs to be thoroughly investigated yet predominantly ignored. Motivated by this, we propose a novel temporal robustness benchmark (TemRobBench), which introduces temporal inconsistency perturbations separately at the visual and textual modalities to assess the robustness of models. We evaluate 16 mainstream LMMs and find that they exhibit over-reliance on prior knowledge and textual context in adversarial environments, while ignoring the actual temporal dynamics in the video. To mitigate this issue, we design panoramic direct preference optimization (PanoDPO), which encourages LMMs to incorporate both visual and linguistic feature preferences simultaneously. Experimental results show that PanoDPO can effectively enhance the model’s robustness and reliability in temporal analysis.
Recent advancements in slow-thinking reasoning models have shown exceptional performance in complex reasoning tasks. However, their tendency for “overthinking” on simple problems leads to excessive computational resource usage and increased inference latency, which hinders their widespread industrial adoption. While current mitigation strategies uniformly reduce reasoning tokens, they risk degrading performance on challenging tasks that require extended reasoning. This paper introduces Difficulty-Adaptive Slow-Thinking (DAST), a novel framework that enables models to autonomously adjust Chain-of-Thought (CoT) length based on problem difficulty. We propose a Token Length Budget (TLB) metric and leverage budget-aware preference optimization to implement DAST, which penalizes inefficiency on simple problems while incentivizing deep reasoning for complex ones. Experiments demonstrate DAST’s significant value for practical application: it effectively mitigates overthinking, substantially lowering costs and latency—while crucially preserving high accuracy on complex problems, paving the way for the efficient application of advanced reasoning models.
Recently, unified information extraction has garnered widespread attention from the NLP community, which aims to use a unified paradigm to perform various information extraction tasks. However, prevalent unified IE approaches inevitably encounter challenges such as noise interference, abstract label semantics, and diverse span granularity. In this paper, we first present three problematic assumptions regarding the capabilities of unified information extraction model. Furthermore, we propose the General Collaborative Information Extraction (GCIE) framework to address these challenges in universal information extraction tasks. Specifically, GCIE consists of a general Recognizer as well as multiple task-specific Experts for recognizing predefined types and extracting spans respectively. The Recognizer is a large language model, while the Experts comprise a series of smaller language models. Together, they collaborate in a two-stage pipeline to perform unified information extraction. Extensive empirical experiments on 6 IE tasks and several datasets, validate the effectiveness and generality of our approach.
We propose a deep learning architecture and test three other machine learning models to automatically detect individuals that will attempt suicide within (1) 30 days and (2) six months, using their social media post data provided in the CL-Psych-Challenge. Additionally, we create and extract three sets of handcrafted features for suicide detection based on the three-stage theory of suicide and prior work on emotions and the use of pronouns among persons exhibiting suicidal ideations. Extensive experimentations show that some of the traditional machine learning methods outperform the baseline with an F1 score of 0.741 and F2 score of 0.833 on subtask 1 (prediction of a suicide attempt 30 days prior). However, the proposed deep learning method outperforms the baseline with F1 score of 0.737 and F2 score of 0.843 on subtask2 (prediction of suicide 6 months prior).
Alzheimer’s disease (AD)-related global healthcare cost is estimated to be $1 trillion by 2050. Currently, there is no cure for this disease; however, clinical studies show that early diagnosis and intervention helps to extend the quality of life and inform technologies for personalized mental healthcare. Clinical research indicates that the onset and progression of Alzheimer’s disease lead to dementia and other mental health issues. As a result, the language capabilities of patient start to decline. In this paper, we show that machine learning-based unsupervised clustering of and anomaly detection with linguistic biomarkers are promising approaches for intuitive visualization and personalized early stage detection of Alzheimer’s disease. We demonstrate this approach on 10 year’s (1980 to 1989) of President Ronald Reagan’s speech data set. Key linguistic biomarkers that indicate early-stage AD are identified. Experimental results show that Reagan had early onset of Alzheimer’s sometime between 1983 and 1987. This finding is corroborated by prior work that analyzed his interviews using a statistical technique. The proposed technique also identifies the exact speeches that reflect linguistic biomarkers for early stage AD.