Junyi Liu

2026

Large language models (LLMs) demonstrate superior reasoning capabilities compared to small language models (SLMs), but incur substantially higher costs. We propose COllaborative REAsoner (COREA), a system that cascades an SLM with an LLM to achieve a balance between accuracy and cost in complex reasoning tasks. COREA first attempts to answer questions using the SLM, which outputs both an answer and a verbalized confidence score. Questions with confidence below a predefined threshold are deferred to the LLM for more accurate resolution. We introduce a reinforcement learning-based training algorithm that aligns the SLM’s confidence through an additional confidence calibration reward. Extensive experiments demonstrate that our method jointly improves the SLM’s reasoning ability and confidence calibration across diverse datasets and model backbones. Compared to using the LLM alone, COREA reduces cost by 21.5% and 16.8% on out-of-domain math and non-math datasets, respectively, with only an absolute pass@1 drop within 2%.

pdf bib abs

Analyzing Hate Speech Amplification on Fringe Platforms
Anika Ghosh Basu | Humberto Jesus Carlon | Junyi Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

Fringe platforms like Gab harbor high volumes of hate speech due to minimal moderation and insular communities. Our study examines thefactors that determine how hate speech amplifies on these platforms. We prepared a novel dataset of 5K+ threads and 50K+ responses from four fringe platforms (Gab, 4chan, Stormfront, and Vanguard), including both structural features (e.g., timestamps, metadata) and con-tent features (e.g., original text, hate intensity of posts), where hate speech amplification was measured using platform-specific engagement metrics. We trained both Generalized Linear Models and Gradient Boosted Tree models to estimate how several features influence the amplification of hate speech on fringe platforms, and used Shapley value estimates to identify the relative importance of the features. Our analysis shows that research insights from social network analysis (SNA) of mainstream sites like X do not directly generalize to fringe platforms. For instance, our experiments show that using features like thread structure and disagreements in early response windows can give up to 74% lift in Root Mean Squared Error (RMSE) of predicting reply counts for hateful posts on fringe platforms, compared to a baseline model that has features like hate intensity and thread age (which would be considered predictive by regular SNA methods).

2023

pdf bib abs

TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction
Junyi Liu | Liangzhi Li | Tong Xiang | Bowen Wang | Yiming Qian
Findings of the Association for Computational Linguistics: EMNLP 2023

Since ChatGPT released its API for public use, the number of applications built on top of commercial large language models (LLMs) increase exponentially. One popular usage of such models is leveraging its in-context learning ability and generating responses given user queries leveraging knowledge obtained by retrieval augmentation. One problem of deploying commercial retrieval-augmented LLMs is the cost due to the additionally retrieved context that largely increases the input token size of the LLMs. To mitigate this, we propose a token compression scheme that includes two methods: summarization compression and semantic compression. The first method applies a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths and reduce token size by doing summarization. The second method further compresses the token size by removing words with lower impact on the semantic. In order to adequately evaluate the effectiveness of the proposed methods, we propose and utilize a dataset called Food-Recommendation DB (FRDB) focusing on food recommendation for women around pregnancy period or infants. Our summarization compression can reduce 65% of the retrieval token size with further 0.3% improvement on the accuracy; semantic compression provides a more flexible way to trade-off the token size with performance, for which we can reduce the token size by 20% with only 1.6% of accuracy drop.

pdf bib abs

Efficient Hybrid Generation Framework for Aspect-Based Sentiment Analysis
Haoran Lv | Junyi Liu | Henan Wang | Yaoming Wang | Jixiang Luo | Yaxiao Liu
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Aspect-based sentiment analysis (ABSA) has attracted broad attention due to its commercial value. Natural Language Generation-based (NLG) approaches dominate the recent advance in ABSA tasks. However, current NLG practices are inefficient because most of them directly employ an autoregressive generation framework that cannot efficiently generate location information and semantic representations of ABSA targets. In this paper, we propose a novel framework, namely Efficient Hybrid Generation (EHG) to revolutionize traditions. Specifically, we leverage an Efficient Hybrid Transformer to generate the location and semantic information of ABSA targets in parallel. Besides, we design a novel global hybrid loss function in combination with bipartite matching to achieve end-to-end model training. Extensive experiments demonstrate that our proposed EHG framework outperforms current state-of-the-art methods in almost all cases and outperforms existing NLG-based methods in terms of inference efficiency.

Co-authors

Venues

Fix author