Chih-Jen Lin


2023

pdf
Linear Classifier: An Often-Forgotten Baseline for Text Classification
Yu-Chen Lin | Si-An Chen | Jie-Jyun Liu | Chih-Jen Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Large-scale pre-trained language models such as BERT are popular solutions for text classification.Due to the superior performance of these advanced methods, nowadays, people often directly train them for a few epochs and deploy the obtained model.In this opinion paper, we point out that this way may only sometimes get satisfactory results.We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods.First, for many text data, linear methods show competitive performance, high efficiency, and robustness.Second, advanced models such as BERT may only achieve the best results if properly applied.Simple baselines help to confirm whether the results of advanced models are acceptable.Our experimental results fully support these points.

2022

pdf
Even the Simplest Baseline Needs Careful Re-investigation: A Case Study on XML-CNN
Si-An Chen | Jie-jyun Liu | Tsung-Han Yang | Hsuan-Tien Lin | Chih-Jen Lin
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The power and the potential of deep learning models attract many researchers to design advanced and sophisticated architectures. Nevertheless, the progress is sometimes unreal due to various possible reasons. In this work, through an astonishing example we argue that more efforts should be paid to ensure the progress in developing a new deep learning method. For a highly influential multi-label text classification method XML-CNN, we show that the superior performance claimed in the original paper was mainly due to some unbelievable coincidences. We re-examine XML-CNN and make a re-implementation which reveals some contradictory findings to the claims in the original paper. Our study suggests suitable baselines for multi-label text classification tasks and confirms that the progress on a new architecture cannot be confidently justified without a cautious investigation.

2021

pdf
Parameter Selection: Why We Should Pay More Attention to It
Jie-Jyun Liu | Tsung-Han Yang | Si-An Chen | Chih-Jen Lin
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The importance of parameter selection in supervised learning is well known. However, due to the many parameter combinations, an incomplete or an insufficient procedure is often applied. This situation may cause misleading or confusing conclusions. In this opinion paper, through an intriguing example we point out that the seriousness goes beyond what is generally recognized. In the topic of multilabel classification for medical code prediction, one influential paper conducted a proper parameter selection on a set, but when moving to a subset of frequently occurring labels, the authors used the same parameters without a separate tuning. The set of frequent labels became a popular benchmark in subsequent studies, which kept pushing the state of the art. However, we discovered that most of the results in these studies cannot surpass the approach in the original paper if a parameter tuning had been conducted at the time. Thus it is unclear how much progress the subsequent developments have actually brought. The lesson clearly indicates that without enough attention on parameter selection, the research progress in our field can be uncertain or even illusive.

2009

pdf
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy
Fang-Lan Huang | Cho-Jui Hsieh | Kai-Wei Chang | Chih-Jen Lin
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers