Minmin Lin

2025

Personality is an important concept in psychology that reflects individual differences in thinking and behavior, and has significant applications across various fields. Most existing personality analysis methods address this issue at the bag level, treating the entire corpus gathered from one individual as a single unit for classification. However, this paradigm presents several challenges. From the data perspective, collecting a large corpus for each individual and performing comprehensive annotations pose significant difficulties in both data collection and labeling. On the application side, concentrating on classifying the entire corpus limits its applicability in more common single-instance scenarios. To address these issues, we propose a new task paradigm in text-based personality representation learning. Specifically, we construct a triplet personality trend comparison dataset to learn single-sentence personality embeddings with desirable metric properties. This approach removes the traditional constraints on data sources, facilitating dataset expansion, and can leverage the transfer capabilities of embeddings to easily adapt to various downstream tasks. Our experiments show that the learned embeddings significantly boost performance by a relative 10% across various applications, including personality detection, personality retrieval, and emotion translation prediction. The code and dataset are available at https://github.com/zjutangk/PTCD.

High-quality annotated data is a cornerstone of modern Natural Language Processing (NLP). While recent methods begin to leverage diverse annotation sources—including Large Language Models (LLMs), Small Language Models (SLMs), and human experts—they often focus narrowly on the labeling step itself. A critical gap remains in the holistic process control required to manage these sources dynamically, addressing complex scheduling and quality-cost trade-offs in a unified manner. Inspired by real-world crowdsourcing companies, we introduce CrowdAgent, a multi-agent system that provides end-to-end process control by integrating task assignment, data annotation, and quality/cost management. It implements a novel methodology that rationally assigns tasks, enabling LLMs, SLMs, and human experts to advance synergistically in a collaborative annotation workflow. We demonstrate the effectiveness of CrowdAgent through extensive experiments on six diverse multimodal classification tasks. The source code and video demo are available at https://github.com/QMMMS/CrowdAgent.

2023

Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to human intervention. It is still underexplored how to reduce the annotation cost in the LLMs era. To bridge this, we revolutionize traditional active learning and propose an innovative collaborative learning framework FreeAL to interactively distill and filter the task-specific knowledge from LLMs. During collaborative training, an LLM serves as an active annotator inculcating its coarse-grained knowledge, while a downstream SLM is incurred as a student to filter out high-quality in-context samples to feedback LLM for the subsequent label refinery. Extensive experiments on eight benchmark datasets demonstrate that FreeAL largely enhances the zero-shot performances for both SLM and LLM without any human supervision.

Co-authors

Lu Xu 1

Venues

emnlp3

Fix author