Joseph James
2025
Seeing isn’t Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms
Tyler Loakman
|
Joseph James
|
Chenghua Lin
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
With the rise of Large Language Models (LLMs) and their vision-enabled counterparts (VLMs), numerous works have investigated their capabilities in different tasks that fuse both vision and language modalities. In this work, we benchmark the extent to which VLMs are able to act as highly-trained phoneticians, interpreting spectrograms and waveforms of speech. To do this, we synthesise a novel dataset containing 4k+ English words spoken in isolation alongside stylistically consistent spectrogram and waveform figures. We test the ability of VLMs to understand these representations of speech through a multiple-choice task whereby models must predict the correct phonemic or graphemic transcription of a spoken word when presented amongst 3 distractor transcriptions that have been selected based on their phonemic edit distance to the ground truth. We observe that both zero-shot and finetuned models rarely perform above chance, demonstrating the difficulty of this task stemming from the requirement for esoteric parametric knowledge of how to interpret such figures, rather than paired samples alone.
2024
On the Rigour of Scientific Writing: Criteria, Analysis, and Insights
Joseph James
|
Chenghao Xiao
|
Yucheng Li
|
Chenghua Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
Rigour is crucial for scientific research as it ensures the reproducibility and validity of results and findings. Despite its importance, little work exists on modelling rigour computationally, and there is a lack of analysis on whether these criteria can effectively signal or measure the rigour of scientific papers in practice. In this paper, we introduce a bottom-up, data-driven framework to automatically identify and define rigour criteria and assess their relevance in scientific writing. Our framework includes rigour keyword extraction, detailed rigour definition generation, and salient criteria identification. Furthermore, our framework is domain-agnostic and can be tailored to the evaluation of scientific rigour for different areas, accommodating the distinct salient criteria across fields. We conducted comprehensive experiments based on datasets collected from different domains (e.g. ICLR, ACL) to demonstrate the effectiveness of our framework in modelling rigour. In addition, we analyse linguist patterns of rigour, revealing that framing certainty is crucial for enhancing the perception of scientific rigour, while suggestion certainty and probability uncertainty diminish it.