Hiroto Kurita
2026
Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings
Tomomasa Hara | Hiroto Kurita | Masaaki Imaizumi | Kentaro Inui | Sho Yokoi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tomomasa Hara | Hiroto Kurita | Masaaki Imaizumi | Kentaro Inui | Sho Yokoi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note that mean pooling can collapse information beyond the first-order statistics of the token embeddings, such as second-order statistics that capture their spatial structure, potentially mapping distinct token embedding distributions to similar text embeddings. Motivated by this concern, we propose a simple metric to quantify such a collapse induced by mean pooling. Then, using this metric, we empirically measure how often this collapse occurs in actual models and texts, and find that modern text encoders are robust to this collapse. In particular, contrastive fine-tuned text encoders tend to be less prone to the collapse than their pretrained backbone models. We also find that the robustness of these text encoders lies in the concentration of token embeddings within each text. In addition, we find that robustness to the collapse, as quantified by our proposed metric, correlates with downstream task performance. Overall, our findings offer a new perspective on why modern text encoders remain effective despite relying on seemingly coarse mean pooling.
2023
TohokuNLP at SemEval-2023 Task 5: Clickbait Spoiling via Simple Seq2Seq Generation and Ensembling
Hiroto Kurita | Ikumi Ito | Hiroaki Funayama | Shota Sasaki | Shoji Moriya | Ye Mengyu | Kazuma Kokuta | Ryujin Hatakeyama | Shusaku Sone | Kentaro Inui
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Hiroto Kurita | Ikumi Ito | Hiroaki Funayama | Shota Sasaki | Shoji Moriya | Ye Mengyu | Kazuma Kokuta | Ryujin Hatakeyama | Shusaku Sone | Kentaro Inui
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes our system submitted to SemEval-2023 Task 5: Clickbait Spoiling. We work on spoiler generation of the subtask 2 and develop a system which comprises two parts: 1) simple seq2seq spoiler generation and 2) post-hoc model ensembling. Using this simple method, we address the challenge of generating multipart spoiler. In the test set, our submitted system outperformed the baseline by a large margin (approximately 10 points above on the BLEU score) for mixed types of spoilers. We also found that our system successfully handled the challenge of the multipart spoiler, confirming the effectiveness of our approach.
Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words
Hiroto Kurita | Goro Kobayashi | Sho Yokoi | Kentaro Inui
Findings of the Association for Computational Linguistics: EMNLP 2023
Hiroto Kurita | Goro Kobayashi | Sho Yokoi | Kentaro Inui
Findings of the Association for Computational Linguistics: EMNLP 2023
The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more informative words receive greater weight, while others receive less. The theory states that, in the lower bound of the optimal value of the contrastive learning objective, the norm of word embedding reflects the information gain associated with the distribution of surrounding words. We also conduct comprehensive experiments using various models, multiple datasets, two methods to measure the implicit weighting of models (Integrated Gradients and SHAP), and two information-theoretic quantities (information gain and self-information). The results provide empirical evidence that contrastive fine-tuning emphasizes informative words.