This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
FlorentinaHristea
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In this paper, we propose a unified approach to model calibration for emotion detection that exploits the complementary strengths of knowledge distillation and the MixUp data augmentation technique to enhance the trustworthiness of emotion detection models. Specifically, we use a MixUp method informed by training dynamics that generates augmented data by interpolating easy-to-learn with ambiguous samples based on their similarity and dissimilarity provided by saliency maps. We use this MixUp method to calibrate the teacher model in the first generation of the knowledge distillation process. To further calibrate the teacher models in each generation, we employ dynamic temperature scaling to update the temperature used for scaling the teacher predictions. We find that calibrating the teachers with our method also improves the calibration of the student models. We test our proposed method both in-distribution (ID) and out-of-distribution (OOD). To obtain better OOD performance, we further fine-tune our models with a simple MixUp method that interpolates a small number of OOD samples with ambiguous ID samples.
In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).