Hannu Toivonen


2021

pdf bib
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Hannu Toivonen | Michele Boggia
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

pdf
EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions
Senja Pollak | Marko Robnik-Šikonja | Matthew Purver | Michele Boggia | Ravi Shekhar | Marko Pranjić | Salla Salmela | Ivar Krustok | Tarmo Paju | Carl-Gustav Linden | Leo Leppänen | Elaine Zosa | Matej Ulčar | Linda Freienthal | Silver Traat | Luis Adrián Cabrera-Diego | Matej Martinc | Nada Lavrač | Blaž Škrlj | Martin Žnidaršič | Andraž Pelicon | Boshko Koloski | Vid Podpečan | Janez Kranjc | Shane Sheehan | Emanuela Boros | Jose G. Moreno | Antoine Doucet | Hannu Toivonen
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program. The collected resources were offered to participants of a hackathon organized as part of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in February 2021. The hackathon had six participating teams who addressed different challenges, either from the list of proposed challenges or their own news-industry-related tasks. This paper goes beyond the scope of the hackathon, as it brings together in a coherent and compact form most of the resources developed, collected and released by the EMBEDDIA project. Moreover, it constitutes a handy source for news media industry and researchers in the fields of Natural Language Processing and Social Science.

pdf
A Baseline Document Planning Method for Automated Journalism
Leo Leppänen | Hannu Toivonen
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

In this work, we present a method for content selection and document planning for automated news and report generation from structured statistical data such as that offered by the European Union’s statistical agency, EuroStat. The method is driven by the data and is highly topic-independent within the statistical dataset domain. As our approach is not based on machine learning, it is suitable for introducing news automation to the wide variety of domains where no training data is available. As such, it is suitable as a low-cost (in terms of implementation effort) baseline for document structuring prior to introduction of domain-specific knowledge.

2019

pdf
Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data
Mark Granroth-Wilding | Hannu Toivonen
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

2017

pdf
Data-Driven News Generation for Automated Journalism
Leo Leppänen | Myriam Munezero | Mark Granroth-Wilding | Hannu Toivonen
Proceedings of the 10th International Conference on Natural Language Generation

Despite increasing amounts of data and ever improving natural language generation techniques, work on automated journalism is still relatively scarce. In this paper, we explore the field and challenges associated with building a journalistic natural language generation system. We present a set of requirements that should guide system design, including transparency, accuracy, modifiability and transferability. Guided by the requirements, we present a data-driven architecture for automated journalism that is largely domain and language independent. We illustrate its practical application in the production of news articles about the 2017 Finnish municipal elections in three languages, demonstrating the successfulness of the data-driven, modular approach of the design. We then draw some lessons for future automated journalism.

2013

pdf
“Let Everything Turn Well in Your Wife”: Generation of Adult Humor Using Lexical Constraints
Alessandro Valitutti | Hannu Toivonen | Antoine Doucet | Jukka M. Toivanen
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)