Chenghao Huang

2025

The formation and circulation of ideas in philosophy have profound implications for understanding philosophical dynamism–enabling us to identify seminal texts, delineate intellectual traditions, and track changing conventions in the act of philosophizing. However, traditional analyses of these issues often depend on manual reading and subjective interpretation, constrained by human cognitive limits. We introduce InterIDEAS, a pioneering dataset designed to bridge philosophy, literary studies, and natural language processing (NLP). By merging theories of intertextuality from literary studies with bibliometric techniques and recent LLMs, InterIDEAS enables both quantitative and qualitative analysis of the intellectual, social, and historical relations embedded within authentic philosophical texts. This dataset not only assists the study of philosophy but also contributes to the development of language models by providing a training corpus that challenges and enhances their interpretative capacity.

2022

pdf bib abs
zydhjh4593@SMM4H’22: A Generic Pre-trained BERT-based Framework for Social Media Health Text Classification
Chenghao Huang | Xiaolu Chen | Yuxi Chen | Yutong Wu | Weimin Yuan | Yan Wang | Yanru Zhang
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes our proposed framework for the 10 text classification tasks of Task 1a, 2a, 2b, 3a, 4, 5, 6, 7, 8, and 9, in the Social Media Mining for Health (SMM4H) 2022. According to the pre-trained BERT-based models, various techniques, including regularized dropout, focal loss, exponential moving average, 5-fold cross-validation, ensemble prediction, and pseudo-labeling, are applied for further formulating and improving the generalization performance of our framework. In the evaluation, the proposed framework achieves the 1st place in Task 3a with a 7% higher F1-score than the median, and obtains a 4% higher averaged F1-score than the median in all participating tasks except Task 1a.

2020

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of assessing the funniness of edited news headlines, which is a part of the SemEval 2020 competition, we preprocess datasets by replacing abbreviation, stemming words, then merge three models including Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representation from Transformer (BERT) by taking the average to perform the best. Our team Ferryman wins the 9th place in Sub-task 1 of Task 7 - Regression.

pdf bib abs
MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
Qi Wu | Peng Wang | Chenghao Huang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of sentiment analysis of code-mixed tweets, which is a part of the SemEval-2020 competition, we preprocess datasets by replacing emoji and deleting uncommon characters and so on, and then fine-tune the Bidirectional Encoder Representation from Transformers(BERT) to perform the best. After exhausting top3 submissions, Our team MeisterMorxrc achieves an averaged F1 score of 0.730 in this task, and and our codalab username is MeisterMorxrc

Co-authors

Yuxi Chen 1

Han Hu (胡晗) 1

JohnMichael Jurgensen 1

Jipeng Li 1

Peng Wang 1

Hao Wang (汪浩, 王昊, 王浩) 1

Qi Wu 1

Venues

Fix author