Yuefeng Shi
2026
Text Analytics Evaluation Framework: A Case Study on LLMs and Social Media
Yuefeng Shi | Nedjma Ousidhoum | Jose Camacho-Collados
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Yuefeng Shi | Nedjma Ousidhoum | Jose Camacho-Collados
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
LLMs have demonstrated exceptional proficiency in a wide range of NLP tasks. However, a notable gap remains in practical data analysis scenarios, particularly when LLMs are required to process long sequences of unstructured documents, such as news feeds or, as specifically addressed in this paper, social media posts. To empirically assess the effectiveness of LLMs in this setting, we introduce a question-based evaluation framework comprising 470 manually curated questions designed to evaluate LLMs’ semantic understanding and reasoning abilities over aggregated text data. We apply our benchmark on diverse Twitter datasets covering various NLP tasks, including sentiment analysis, hate speech detection, and emotion recognition. Our results reveal that the performance depends heavily on input scale and the complexity of the data sources, declining noticeably in multi-label or target-dependent scenarios. In addition, as task complexity increases, performance drops progressively from basic semantic existence identification to more demanding operations such as comparison, counting, and calculation. Furthermore, as the input size grows beyond 500 instances, we identify a common limitation across LLMs, particularly Open-weights models: performance degrades substantially, especially on numerical tasks. These findings highlight critical architectural bottlenecks in current LLMs for performing rigorous quantitative analysis over large text collections.
2022
Exploiting Sentiment and Common Sense for Zero-shot Stance Detection
Yun Luo | Zihan Liu | Yuefeng Shi | Stan Z. Li | Yue Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Yun Luo | Zihan Liu | Yuefeng Shi | Stan Z. Li | Yue Zhang
Proceedings of the 29th International Conference on Computational Linguistics
The stance detection task aims to classify the stance toward given documents and topics. Since the topics can be implicit in documents and unseen in training data for zero-shot settings, we propose to boost the transferability of the stance detection model by using sentiment and commonsense knowledge, which are seldom considered in previous studies. Our model includes a graph autoencoder module to obtain commonsense knowledge and a stance detection module with sentiment and commonsense. Experimental results show that our model outperforms the state-of-the-art methods on the zero-shot and few-shot benchmark dataset–VAST. Meanwhile, ablation studies prove the significance of each module in our model. Analysis of the relations between sentiment, common sense, and stance indicates the effectiveness of sentiment and common sense.
2020
Entity Enhanced BERT Pre-training for Chinese NER
Chen Jia | Yuefeng Shi | Qinrong Yang | Yue Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Chen Jia | Yuefeng Shi | Qinrong Yang | Yue Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Character-level BERT pre-trained in Chinese suffers a limitation of lacking lexicon information, which shows effectiveness for Chinese NER. To integrate the lexicon into pre-trained LMs for Chinese NER, we investigate a semi-supervised entity enhanced BERT pre-training method. In particular, we first extract an entity lexicon from the relevant raw text using a new-word discovery method. We then integrate the entity information into BERT using Char-Entity-Transformer, which augments the self-attention using a combination of character and entity representations. In addition, an entity classification task helps inject the entity information into model parameters in pre-training. The pre-trained models are used for NER fine-tuning. Experiments on a news dataset and two datasets annotated by ourselves for NER in long-text show that our method is highly effective and achieves the best results.
2019
A Pilot Study for Chinese SQL Semantic Parsing
Qingkai Min | Yuefeng Shi | Yue Zhang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Qingkai Min | Yuefeng Shi | Yue Zhang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
The task of semantic parsing is highly useful for dialogue and question answering systems. Many datasets have been proposed to map natural language text into SQL, among which the recent Spider dataset provides cross-domain samples with multiple tables and complex queries. We build a Spider dataset for Chinese, which is currently a low-resource language in this task area. Interesting research questions arise from the uniqueness of the language, which requires word segmentation, and also from the fact that SQL keywords and columns of DB tables are typically written in English. We compare character- and word-based encoders for a semantic parser, and different embedding schemes. Results show that word-based semantic parser is subject to segmentation errors and cross-lingual word embeddings are useful for text-to-SQL.