This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
GeorgeDueñas
Also published as:
George Duenas
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
We describe the second-best run for the shared task on predicting the difficulty of Multi-Choice Questions (MCQs) in the medical domain. Our approach leverages prompting Large Language Models (LLMs). Rather than straightforwardly querying difficulty, we simulate medical candidate’s responses to questions across various scenarios. For this, more than 10,000 prompts were required for the 467 training questions and the 200 test questions. From the answers to these prompts, we extracted a set of features which we combined with a Ridge Regression to which we only adjusted the regularization parameter using the training set. Our motivation stems from the belief that MCQ difficulty is influenced more by the respondent population than by item-specific content features. We conclude that the approach is promising and has the potential to improve other item-based systems on this task, which turned out to be extremely challenging and has ample room for future improvement.
Creating high-quality multiple-choice items requires careful attention to several factors, including ensuring that there is only one correct option, that options are independent of each other, that there is no overlap between options, and that each option is plausible. This attention is reflected in the explanations provided by human item-writers for each option. This study aimed to compare the creation of explanations of multiple-choice item options for reading comprehension by ChatGPT with those created by humans. We used two context-dependent multiple-choice item sets created based on EvidenceCentered Design. Results indicate that ChatGPT is capable of producing explanations with different type of information that are comparable to those created by humans. So that humans could benefit from additional information given to enhance their explanations. We conclude that ChatGPT ability to generate explanations for multiple-choice item options in reading comprehension tests is comparable to that of humans.
The RUFINO team proposed a non-supervised, conceptually-simple and low-cost approach for addressing the Multilingual and Cross-lingual Semantic Word Similarity challenge at SemEval 2017. The proposed systems were cross-lingual extensions of popular monolingual lexical similarity approaches such as PMI and word2vec. The extensions were possible by means of a small parallel list of concepts similar to the Swadesh’s list, which we obtained in a semi-automatic way. In spite of its simplicity, our approach showed to be effective obtaining statistically-significant and consistent results in all datasets proposed for the task. Besides, we provide some research directions for improving this novel and affordable approach.