Peratham Wiriyathammabhum


2021

pdf bib
TTCB System Description to a Shared Task on Implicit and Underspecified Language 2021
Peratham Wiriyathammabhum
Proceedings of the 1st Workshop on Understanding Implicit and Underspecified Language

In this report, we describe our transformers for text classification baseline (TTCB) submissions to a shared task on implicit and underspecified language 2021. We cast the task of predicting revision requirements in collaboratively edited instructions as text classification. We considered transformer-based models which are the current state-of-the-art methods for text classification. We explored different training schemes, loss functions, and data augmentations. Our best result of 68.45% test accuracy (68.84% validation accuracy), however, consists of an XLNet model with a linear annealing scheduler and a cross-entropy loss. We do not observe any significant gain on any validation metric based on our various design choices except the MiniLM which has a higher validation F1 score and is faster to train by a half but also a lower validation accuracy score.

2019

pdf bib
Referring to Objects in Videos Using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum | Abhinav Shrivastava | Vlad Morariu | Larry Davis
Proceedings of the Second Workshop on Shortcomings in Vision and Language

This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface realization to enable us to investigate the problem of grounding spatio-temporal identifying descriptions in videos. We then propose a two-stream modular attention network that learns and grounds spatio-temporal identifying descriptions based on appearance and motion. We show that motion modules help to ground motion-related words and also help to learn in appearance modules because modular neural networks resolve task interference between modules. Finally, we propose a future challenge and a need for a robust system arising from replacing ground truth visual annotations with automatic video object detector and temporal event localization.