2025
pdf
bib
abs
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition
Hisham Abdullah Alyahya
|
Haidar Khan
|
Yazeed Alnumay
|
M Saiful Bari
|
Bulent Yener
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce ZeroSumEval, a dynamic, competition-based, and evolving evaluation framework for Large Language Models (LLMs) that leverages competitive games. ZeroSumEval encompasses a diverse suite of games, including security challenges (Capture the Flag), classic board games (chess), and knowledge tests (MathQuiz). These games are designed to evaluate a range of capabilities such as strategic reasoning, planning, knowledge application, safety, and adaptability. Building upon recent studies that highlight the effectiveness of game-based evaluations for LLMs, ZeroSumEval enhances these approaches by providing a standardized and extensible framework for easily implementing games and leverages DSPy to provide a better abstraction for LLM player strategies.
2022
pdf
bib
abs
SPOCK @ Causal News Corpus 2022: Cause-Effect-Signal Span Detection Using Span-Based and Sequence Tagging Models
Anik Saha
|
Alex Gittens
|
Jian Ni
|
Oktie Hassanzadeh
|
Bulent Yener
|
Kavitha Srinivas
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Understanding causal relationship is an importance part of natural language processing. We address the causal information extraction problem with different neural models built on top of pre-trained transformer-based language models for identifying Cause, Effect and Signal spans, from news data sets. We use the Causal News Corpus subtask 2 training data set to train span-based and sequence tagging models. Our span-based model based on pre-trained BERT base weights achieves an F1 score of 47.48 on the test set with an accuracy score of 36.87 and obtained 3rd place in the Causal News Corpus 2022 shared task.
pdf
bib
abs
SPOCK at FinCausal 2022: Causal Information Extraction Using Span-Based and Sequence Tagging Models
Anik Saha
|
Jian Ni
|
Oktie Hassanzadeh
|
Alex Gittens
|
Kavitha Srinivas
|
Bulent Yener
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022
Causal information extraction is an important task in natural language processing, particularly in finance domain. In this work, we develop several information extraction models using pre-trained transformer-based language models for identifying cause and effect text spans from financial documents. We use FinCausal 2021 and 2022 data sets to train span-based and sequence tagging models. Our ensemble of sequence tagging models based on the RoBERTa-Large pre-trained language model achieves an F1 score of 94.70 with Exact Match score of 85.85 and obtains the 1st place in the FinCausal 2022 competition.
2015
pdf
bib
Context-aware Entity Morph Decoding
Boliang Zhang
|
Hongzhao Huang
|
Xiaoman Pan
|
Sujian Li
|
Chin-Yew Lin
|
Heng Ji
|
Kevin Knight
|
Zhen Wen
|
Yizhou Sun
|
Jiawei Han
|
Bulent Yener
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
pdf
bib
Be Appropriate and Funny: Automatic Entity Morph Encoding
Boliang Zhang
|
Hongzhao Huang
|
Xiaoman Pan
|
Heng Ji
|
Kevin Knight
|
Zhen Wen
|
Yizhou Sun
|
Jiawei Han
|
Bulent Yener
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)