Query classification, including multiple subtasks such as intent and category prediction, is a vital part of e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users’ posterior click behavior to construct training samples, resulting in a Matthew vicious cycle. Furthermore, the subtasks of query classification lack a unified framework, leading to low efficiency for algorithm improvement.In this paper, we propose a novel Semi-supervised Scalable Unified Framework (SSUF), containing multiple enhanced modules to unify the query classification tasks. The knowledge-enhanced module uses world knowledge to enhance query representations and solve the problem of insufficient query information. The label-enhanced module uses label semantics and semi-supervised signals to reduce the dependence on posterior labels. The structure-enhanced module enhances the label representation based on the complex label relations. Each module is highly pluggable, and input features can be added or removed as needed according to each subtask. We conduct extensive offline and online A/B experiments, and the results show that SSUF significantly outperforms the state-of-the-art models.
Recent major milestones have successfully reconstructed natural language from non-invasive brain signals (e.g. functional Magnetic Resonance Imaging (fMRI) and Electroencephalogram (EEG)) across subjects. However, we find current dataset splitting strategies for cross-subject brain-to-text decoding are wrong. Specifically, we first demonstrate that all current splitting methods suffer from data leakage problem, which refers to the leakage of validation and test data into training set, resulting in significant overfitting and overestimation of decoding models. In this study, we develop a right cross-subject data splitting criterion without data leakage for decoding fMRI and EEG signal to text. Some SOTA brain-to-text decoding models are re-evaluated correctly with the proposed criterion for further research.
Modern recommendation systems grapple with reconciling users’ enduring preferences with transient interests, particularly in click-through rate (CTR) prediction. Existing approaches inadequately fuse long-term behavioral profiles (e.g., aggregated purchase trends) and short-term interaction sequences (e.g., real-time clicks), suffering from representational misalignment and noise in transient signals. We propose HierDiffuse, a unified framework that redefines interest fusion as a hierarchical denoising process through diffusion models. Our approach addresses these challenges via three innovations: (1) A cross-scale diffusion mechanism aligns long- and short-term representations by iteratively refining long-term interests using short-term contextual guidance; (2) A Semantic Guidance Disentanglement (SGD) mechanism explicitly decouples core interests from noise in short-term signals;(3) Trajectory Convergence Constraint (TCC) is proposed to accelerate diffusion model reasoning without reducing generation quality to meet the constraints of high QPS (Queries Per Second) and low latency for online deployment of recommendation or advertising systems.HierDiffuse eliminates ad-hoc fusion operators, dynamically integrates multi-scale interests, and enhances robustness to spurious interactions as well as improves inference speed. Extensive experiments on real-world datasets demonstrate state-of-the-art performance, with significant improvements in CTR prediction accuracy and robustness to noisy interactions. Our work establishes diffusion models as a principled paradigm for adaptive interest fusion in recommendation systems.