Marie Hledíková


2022

pdf
ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech
Anna Nedoluzhko | Muskaan Singh | Marie Hledíková | Tirthankar Ghosal | Ondřej Bojar
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Taking minutes is an essential component of every meeting, although the goals, style, and procedure of this activity (“minuting” for short) can vary. Minuting is a rather unstructured writing activity and is affected by who is taking the minutes and for whom the intended minutes are. With the rise of online meetings, automatic minuting would be an important benefit for the meeting participants as well as for those who might have missed the meeting. However, automatically generating meeting minutes is a challenging problem due to a variety of factors including the quality of automatic speech recorders (ASRs), availability of public meeting data, subjective knowledge of the minuter, etc. In this work, we present the first of its kind dataset on Automatic Minuting. We develop a dataset of English and Czech technical project meetings which consists of transcripts generated from ASRs, manually corrected, and minuted by several annotators. Our dataset, AutoMin, consists of 113 (English) and 53 (Czech) meetings, covering more than 160 hours of meeting content. Upon acceptance, we will publicly release (aaa.bbb.ccc) the dataset as a set of meeting transcripts and minutes, excluding the recordings for privacy reasons. A unique feature of our dataset is that most meetings are equipped with more than one minute, each created independently. Our corpus thus allows studying differences in what people find important while taking the minutes. We also provide baseline experiments for the community to explore this novel problem further. To the best of our knowledge AutoMin is probably the first resource on minuting in English and also in a language other than English (Czech).

pdf bib
The Second Automatic Minuting (AutoMin) Challenge: Generating and Evaluating Minutes from Multi-Party Meetings
Tirthankar Ghosal | Marie Hledíková | Muskaan Singh | Anna Nedoluzhko | Ondřej Bojar
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges

We would host the AutoMin generation chal- lenge at INLG 2023 as a follow-up of the first AutoMin shared task at Interspeech 2021. Our shared task primarily concerns the automated generation of meeting minutes from multi-party meeting transcripts. In our first venture, we ob- served the difficulty of the task and highlighted a number of open problems for the community to discuss, attempt, and solve. Hence, we invite the Natural Language Generation (NLG) com- munity to take part in the second iteration of AutoMin. Like the first, the second AutoMin will feature both English and Czech meetings and the core task of summarizing the manually- revised transcripts into bulleted minutes. A new challenge we are introducing this year is to devise efficient metrics for evaluating the quality of minutes. We will also host an optional track to generate minutes for European parliamentary sessions. We carefully curated the datasets for the above tasks. Our ELITR Minuting Corpus has been recently accepted to LREC 2022 and publicly released. We are already preparing a new test set for evaluating the new shared tasks. We hope to carry forward the learning from the first AutoMin and instigate more community attention and interest in this timely yet chal- lenging problem. INLG, the premier forum for the NLG community, would be an appropriate venue to discuss the challenges and future of Automatic Minuting. The main objective of the AutoMin GenChal at INLG 2023 would be to come up with efficient methods to auto- matically generate meeting minutes and design evaluation metrics to measure the quality of the minutes.