Yunsong Liu

2024

pdf bib abs
CoCoHD: Congress Committee Hearing Dataset
Arnav Hiray | Yunsong Liu | Mingxiao Song | Agam Shah | Sudheer Chava
Findings of the Association for Computational Linguistics: EMNLP 2024

U.S. congressional hearings significantly influence the national economy and social fabric, impacting individual lives. Despite their importance, there is a lack of comprehensive datasets for analyzing these discourses. To address this, we propose the **Co**ngress **Co**mmittee **H**earing **D**ataset (CoCoHD), covering hearings from 1997 to 2024 across 86 committees, with 32,697 records. This dataset enables researchers to study policy language on critical issues like healthcare, LGBTQ+ rights, and climate justice. We demonstrate its potential with a case study on 1,000 energy-related sentences, analyzing the Energy and Commerce Committee’s stance on fossil fuel consumption. By fine-tuning pre-trained language models, we create energy-relevant measures for each hearing. Our market analysis shows that natural language analysis using CoCoHD can predict and highlight trends in the energy sector.

Co-authors

Venues

findings1

Fix data

Yunsong Liu

Fixing paper assignments

2024

Co-authors

Venues