This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
RyujiKano
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Much research has reported the training data of summarization models are noisy; summaries often do not reflect what is written in the source texts. We propose an effective method of curriculum learning to train summarization models from such noisy data. Curriculum learning is used to train sequence-to-sequence models with noisy data. In translation tasks, previous research quantified noise of the training data using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, we propose a model that can quantify noise from a single noisy corpus. We conduct experiments on three summarization models; one pretrained model and two non-pretrained models, and verify our method improves the performance. Furthermore, we analyze how different curricula affect the performance of pretrained and non-pretrained summarization models. Our result on human evaluation also shows our method improves the performance of summarization models.
We propose Implicit Quote Extractor, an end-to-end unsupervised extractive neural summarization model for conversational texts. When we reply to posts, quotes are used to highlight important part of texts. We aim to extract quoted sentences as summaries. Most replies do not explicitly include quotes, so it is difficult to use quotes as supervision. However, even if it is not explicitly shown, replies always refer to certain parts of texts; we call them implicit quotes. Implicit Quote Extractor aims to extract implicit quotes as summaries. The training task of the model is to predict whether a reply candidate is a true reply to a post. For prediction, the model has to choose a few sentences from the post. To predict accurately, the model learns to extract sentences that replies frequently refer to. We evaluate our model on two email datasets and one social media dataset, and confirm that our model is useful for extractive summarization. We further discuss two topics; one is whether quote extraction is an important factor for summarization, and the other is whether our model can capture salient sentences that conventional methods cannot.
Automated generation of medical reports that describe the findings in the medical images helps radiologists by alleviating their workload. Medical report generation system should generate correct and concise reports. However, data imbalance makes it difficult to train models accurately. Medical datasets are commonly imbalanced in their finding labels because incidence rates differ among diseases; moreover, the ratios of abnormalities to normalities are significantly imbalanced. We propose a novel reinforcement learning method with a reconstructor to improve the clinical correctness of generated reports to train the data-to-text module with a highly imbalanced dataset. Moreover, we introduce a novel data augmentation strategy for reinforcement learning to additionally train the model on infrequent findings. From the perspective of a practical use, we employ a Two-Stage Medical Report Generator (TS-MRGen) for controllable report generation from input images. TS-MRGen consists of two separated stages: an image diagnosis module and a data-to-text module. Radiologists can modify the image diagnosis module results to control the reports that the data-to-text module generates. We conduct an experiment with two medical datasets to assess the data-to-text module and the entire two-stage model. Results demonstrate that the reports generated by our model describe the findings in the input image more correctly.
The automated generation of information indicating the characteristics of articles such as headlines, key phrases, summaries and categories helps writers to alleviate their workload. Previous research has tackled these tasks using neural abstractive summarization and classification methods. However, the outputs may be inconsistent if they are generated individually. The purpose of our study is to generate multiple outputs consistently. We introduce a multi-task learning model with a shared encoder and multiple decoders for each task. We propose a novel loss function called hierarchical consistency loss to maintain consistency among the attention weights of the decoders. To evaluate the consistency, we employ a human evaluation. The results show that our model generates more consistent headlines, key phrases and categories. In addition, our model outperforms the baseline model on the ROUGE scores, and generates more adequate and fluent headlines.
We proposed a model that integrates discussion structures with neural networks to classify discourse acts. Several attempts have been made in earlier works to analyze texts that are used in various discussions. The importance of discussion structures has been explored in those works but their methods required a sophisticated design to combine structural features with a classifier. Our model introduces tree learning approaches and a graph learning approach to directly capture discussion structures without structural features. In an evaluation to classify discussion discourse acts in Reddit, the model achieved improvements of 1.5% in accuracy and 2.2 in FB1 score compared to the previous best model. We further analyzed the model using an attention mechanism to inspect interactions among different learning approaches.
We leverage a popularity measure in social media as a distant label for extractive summarization of online conversations. In social media, users can vote, share, or bookmark a post they prefer. The number of these actions is regarded as a measure of popularity. However, popularity is not determined solely by content of a post, e.g., a text or an image it contains, but is highly based on its contexts, e.g., timing, and authority. We propose Disjunctive model that computes the contribution of content and context separately. For evaluation, we build a dataset where the informativeness of comments is annotated. We evaluate the results with ranking metrics, and show that our model outperforms the baseline models which directly use popularity as a measure of informativeness.