Bo Han
2025
Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models
Xinyu Pang | Ruixin Hong | Zhanke Zhou | Fangrui Lv | Xinwei Yang | Zhilong Liang | Bo Han | Changshui Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Xinyu Pang | Ruixin Hong | Zhanke Zhou | Fangrui Lv | Xinwei Yang | Zhilong Liang | Bo Han | Changshui Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Physics problems constitute a significant aspect of reasoning, necessitating complicated reasoning ability and abundant physics knowledge. However, existing large language models (LLMs) frequently fail due to a lack of knowledge or incorrect knowledge application. To mitigate these issues, we propose Physics Reasoner, a knowledge-augmented framework to solve physics problems with LLMs. Specifically, the proposed framework constructs a comprehensive formula set to provide explicit physics knowledge and utilizes checklists containing detailed instructions to guide effective knowledge application. Namely, given a physics problem, Physics Reasoner solves it through three stages: problem analysis, formula retrieval, and guided reasoning. During the process, checklists are employed to enhance LLMs’ self-improvement in the analysis and reasoning stages. Empirically, Physics Reasoner mitigates the issues of insufficient knowledge and incorrect application, achieving state-of-the-art performance on SciBench with an average accuracy improvement of 5.8%.
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
Yunhao Gou | Hansi Yang | Zhili Liu | Kai Chen | Yihan Zeng | Lanqing Hong | Zhenguo Li | Qun Liu | Bo Han | James Kwok | Yu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yunhao Gou | Hansi Yang | Zhili Liu | Kai Chen | Yihan Zeng | Lanqing Hong | Zhenguo Li | Qun Liu | Bo Han | James Kwok | Yu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Visual Instruction Tuning (VIT) aims to enhance Multimodal Large Language Models (MLLMs), yet its effectiveness is often compromised by corrupted datasets with issues such as hallucinated content, incorrect responses, and poor OCR quality. Previous approaches to address these challenges have focused on refining datasets through high-quality data collection or rule-based filtering that can be costly or limited in scope. In this paper, we conduct a systematic investigation into the impact of corrupted data on MLLMs and discover that, although corrupted data degrade model performance, such adverse effects are largely reversible, and MLLMs are corrupted but not broken. Specifically, we find that disabling a small subset of parameters can almost fully restore performance. Moreover, corrupted MLLMs inherently possess the capability to differentiate between clean and corrupted samples, facilitating dataset cleaning without external intervention. Building on these insights, we introduce a corruption-robust training paradigm that significantly surpasses existing strategies for mitigating the effects of corrupted data.
2016
Temporal Modelling of Geospatial Words in Twitter
Bo Han | Antonio Jimeno Yepes | Andrew MacKinlay | Lianhua Chi
Proceedings of the Australasian Language Technology Association Workshop 2016
Bo Han | Antonio Jimeno Yepes | Andrew MacKinlay | Lianhua Chi
Proceedings of the Australasian Language Technology Association Workshop 2016
:telephone::person::sailboat::whale::okhand: ; or “Call me Ishmael” – How do you translate emoji?
Will Radford | Ben Hachey | Bo Han | Andy Chisholm
Proceedings of the Australasian Language Technology Association Workshop 2016
Will Radford | Ben Hachey | Bo Han | Andy Chisholm
Proceedings of the Australasian Language Technology Association Workshop 2016
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Bo Han | Alan Ritter | Leon Derczynski | Wei Xu | Tim Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Bo Han | Alan Ritter | Leon Derczynski | Wei Xu | Tim Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text
Bo Han | Afshin Rahimi | Leon Derczynski | Timothy Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Bo Han | Afshin Rahimi | Leon Derczynski | Timothy Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
This paper presents the shared task for English Twitter geolocation prediction in WNUT 2016. We discuss details of task settings, data preparations and participant systems. The derived dataset and performance figures from each system provide baselines for future research in this realm.
2015
Investigating Public Health Surveillance using Twitter
Antonio Jimeno Yepes | Andrew MacKinlay | Bo Han
Proceedings of BioNLP 15
Antonio Jimeno Yepes | Andrew MacKinlay | Bo Han
Proceedings of BioNLP 15
Proceedings of the Workshop on Noisy User-generated Text
Wei Xu | Bo Han | Alan Ritter
Proceedings of the Workshop on Noisy User-generated Text
Wei Xu | Bo Han | Alan Ritter
Proceedings of the Workshop on Noisy User-generated Text
Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition
Timothy Baldwin | Marie Catherine de Marneffe | Bo Han | Young-Bum Kim | Alan Ritter | Wei Xu
Proceedings of the Workshop on Noisy User-generated Text
Timothy Baldwin | Marie Catherine de Marneffe | Bo Han | Young-Bum Kim | Alan Ritter | Wei Xu
Proceedings of the Workshop on Noisy User-generated Text
2014
Identifying Twitter Location Mentions
Bo Han | Antonio Jimeno Yepes | Andrew MacKinlay | Qiang Chen
Proceedings of the Australasian Language Technology Association Workshop 2014
Bo Han | Antonio Jimeno Yepes | Andrew MacKinlay | Qiang Chen
Proceedings of the Australasian Language Technology Association Workshop 2014
2013
A Stacking-based Approach to Twitter User Geolocation Prediction
Bo Han | Paul Cook | Timothy Baldwin
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Bo Han | Paul Cook | Timothy Baldwin
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Unsupervised Word Usage Similarity in Social Media Texts
Spandana Gella | Paul Cook | Bo Han
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
Spandana Gella | Paul Cook | Bo Han
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
2012
Geolocation Prediction in Social Media Data by Finding Location Indicative Words
Bo Han | Paul Cook | Timothy Baldwin
Proceedings of COLING 2012
Bo Han | Paul Cook | Timothy Baldwin
Proceedings of COLING 2012
Automatically Constructing a Normalisation Dictionary for Microblogs
Bo Han | Paul Cook | Timothy Baldwin
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Bo Han | Paul Cook | Timothy Baldwin
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A Support Platform for Event Detection using Social Intelligence
Timothy Baldwin | Paul Cook | Bo Han | Aaron Harwood | Shanika Karunasekera | Masud Moshtaghi
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Timothy Baldwin | Paul Cook | Bo Han | Aaron Harwood | Shanika Karunasekera | Masud Moshtaghi
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
2011
Lexical Normalisation of Short Text Messages: Makn Sens a #twitter
Bo Han | Timothy Baldwin
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Bo Han | Timothy Baldwin
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
2010
Semantic Role Labeling for News Tweets
Xiaohua Liu | Kuan Li | Bo Han | Ming Zhou | Long Jiang | Zhongyang Xiong | Changning Huang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
Xiaohua Liu | Kuan Li | Bo Han | Ming Zhou | Long Jiang | Zhongyang Xiong | Changning Huang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
Search
Fix author
Co-authors
- Timothy Baldwin 8
- Paul Cook 5
- Antonio Jimeno Yepes 3
- Kuan Li 3
- Xiaohua Liu 3
- Andrew MacKinlay 3
- Alan Ritter 3
- Wei Xu 3
- Ming Zhou 3
- Leon Derczynski 2
- Long Jiang 2
- Zhongyang Xiong 2
- Kai Chen 1
- Qiang Chen 1
- Lianhua Chi 1
- Andy Chisholm 1
- Spandana Gella 1
- Yunhao Gou 1
- Ben Hachey 1
- Aaron Harwood 1
- Ruixin Hong 1
- Lanqing Hong 1
- Changning Huang 1
- Shanika Karunasekera 1
- Young-Bum Kim 1
- James Kwok 1
- Zhenguo Li 1
- Zhilong Liang 1
- Zhili Liu 1
- Qun Liu 1
- Fangrui Lv 1
- Masud Moshtaghi 1
- Xinyu Pang 1
- Will Radford 1
- Afshin Rahimi 1
- Stephan Hyeonjun Stiller 1
- Daniel Tse 1
- Xinwei Yang 1
- Hansi Yang 1
- Yihan Zeng 1
- Changshui Zhang 1
- Yu Zhang 1
- Zhanke Zhou 1
- Marie-Catherine de Marneffe 1