Yao Sun

2024

We explore using LLMs, GPT-4 specifically, to generate draft sentence-level Chinese Uniform Meaning Representations (UMRs) that human annotators can revise to speed up the UMR annotation process. In this study, we use few-shot learning and Think-Aloud prompting to guide GPT-4 to generate sentence-level graphs of UMR. Our experimental results show that compared with annotating UMRs from scratch, using LLMs as a preprocessing step reduces the annotation time by two thirds on average. This indicates that there is great potential for integrating LLMs into the pipeline for complicated semantic annotation tasks.

pdf abs
What Are the Implications of Your Question? Non-Information Seeking Question-Type Identification in CNN Transcripts
Yao Sun | Anastasiia Tatlubaeva | Zhihan Li | Chester Palen-Michel
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Non-information seeking questions (NISQ) capture the subtle dynamics of human discourse. In this work, we utilize a dataset of over 1,500 information-seeking question(ISQ) and NISQ to evaluate human and machine performance on classifying fine-grained NISQ types. We introduce the first publicly available corpus focused on annotating both ISQs and NISQs as an initial benchmark. Additionally, we establish competitive baselines by assessing diverse systems, including Generative Pre-Trained Transformer Language models, on a new question classification task. Our results demonstrate the inherent complexity of making nuanced NISQ distinctions. The dataset is publicly available at https://github.com/YaoSun0422/NISQ_dataset.git

Co-authors

Jiawei Wu 1

Anastasiia Tatlubaeva 1

Zhihan Li 1

Chester Palen-Michel 1

Yao Sun

2024

Co-authors

Venues