Jiyi Li


Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection
Sheng Li | Jiyi Li | Qianying Liu | Zhuo Gong
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the advent of the General Data Protection Regulation (GDPR) and increasing privacy concerns, the sharing of speech data is faced with significant challenges. Protecting the sensitive content of speech is the same important as the voiceprint. This paper proposes an effective speech content protection method by constructing a frame-by-frame adversarial speech generation system. We revisited the adversarial examples generating method in the recent machine learning field and selected the phonetic state sequence of sensitive speech for the adversarial examples generation. We build an adversarial speech collection. Moreover, based on the speech collection, we proposed a neural network-based frame-by-frame mapping method to recover the speech content by converting from the adversarial speech to the human speech. Experiment shows our proposed method can encode and recover any sensitive audio, and our method is easy to be conducted with publicly available resources of speech recognition technology.

Exploiting Labeled and Unlabeled Data via Transformer Fine-tuning for Peer-Review Score Prediction
Panitan Muangkammuen | Fumiyo Fukumoto | Jiyi Li | Yoshimi Suzuki
Findings of the Association for Computational Linguistics: EMNLP 2022

Automatic Peer-review Aspect Score Prediction (PASP) of academic papers can be a helpful assistant tool for both reviewers and authors. Most existing works on PASP utilize supervised learning techniques. However, the limited number of peer-review data deteriorates the performance of PASP. This paper presents a novel semi-supervised learning (SSL) method that incorporates the Transformer fine-tuning into the Γ-model, a variant of the Ladder network, to leverage contextual features from unlabeled data. Backpropagation simultaneously minimizes the sum of supervised and unsupervised cost functions, avoiding the need for layer-wise pre-training. The experimental results show that our model outperforms the supervised and naive semi-supervised learning baselines. Our source codes are available online.

Multi-Domain Dialogue State Tracking with Top-K Slot Self Attention
Longfei Yang | Jiyi Li | Sheng Li | Takahiro Shinozaki
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

As an important component of task-oriented dialogue systems, dialogue state tracking is designed to track the dialogue state through the conversations between users and systems. Multi-domain dialogue state tracking is a challenging task, in which the correlation among different domains and slots needs to consider. Recently, slot self-attention is proposed to provide a data-driven manner to handle it. However, a full-support slot self-attention may involve redundant information interchange. In this paper, we propose a top-k attention-based slot self-attention for multi-domain dialogue state tracking. In the slot self-attention layers, we force each slot to involve information from the other k prominent slots and mask the rest out. The experimental results on two mainstream multi-domain task-oriented dialogue datasets, MultiWOZ 2.0 and MultiWOZ 2.4, present that our proposed approach is effective to improve the performance of multi-domain dialogue state tracking. We also find that the best result is obtained when each slot interchanges information with only a few slots.


Abstract, Rationale, Stance: A Joint Model for Scientific Claim Verification
Zhiwei Zhang | Jiyi Li | Fumiyo Fukumoto | Yanming Ye
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Scientific claim verification can help the researchers to easily find the target scientific papers with the sentence evidence from a large corpus for the given claim. Some existing works propose pipeline models on the three tasks of abstract retrieval, rationale selection and stance prediction. Such works have the problems of error propagation among the modules in the pipeline and lack of sharing valuable information among modules. We thus propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework by including claim information. In addition, we enhance the information exchanges and constraints among tasks by proposing a regularization term between the sentence attention scores of abstract retrieval and the estimated outputs of rational selection. The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.


Multi-task Peer-Review Score Prediction
Jiyi Li | Ayaka Sato | Kazuya Shimura | Fumiyo Fukumoto
Proceedings of the First Workshop on Scholarly Document Processing

Automatic prediction on the peer-review aspect scores of academic papers can be a useful assistant tool for both reviewers and authors. To handle the small size of published datasets on the target aspect of scores, we propose a multi-task approach to leverage additional information from other aspects of scores for improving the performance of the target. Because one of the problems of building multi-task models is how to select the proper resources of auxiliary tasks and how to select the proper shared structures. We propose a multi-task shared structure encoding approach which automatically selects good shared network structures as well as good auxiliary resources. The experiments based on peer-review datasets show that our approach is effective and has better performance on the target scores than the single-task method and naive multi-task methods.

A Neural Local Coherence Analysis Model for Clarity Text Scoring
Panitan Muangkammuen | Sheng Xu | Fumiyo Fukumoto | Kanda Runapongsa Saikaew | Jiyi Li
Proceedings of the 28th International Conference on Computational Linguistics

Local coherence relation between two phrases/sentences such as cause-effect and contrast gives a strong influence of whether a text is well-structured or not. This paper follows the assumption and presents a method for scoring text clarity by utilizing local coherence between adjacent sentences. We hypothesize that the contextual features of coherence relations learned by utilizing different data from the target training data are also possible to discriminate well-structured of the target text and thus help to score the text clarity. We propose a text clarity scoring method that utilizes local coherence analysis with an out-domain setting, i.e. the training data for the source and target tasks are different from each other. The method with language model pre-training BERT firstly trains the local coherence model as an auxiliary manner and then re-trains it together with clarity text scoring model. The experimental results by using the PeerRead benchmark dataset show the improvement compared with a single model, scoring text clarity model. Our source codes are available online.

HSCNN: A Hybrid-Siamese Convolutional Neural Network for Extremely Imbalanced Multi-label Text Classification
Wenshuo Yang | Jiyi Li | Fumiyo Fukumoto | Yanming Ye
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The data imbalance problem is a crucial issue for the multi-label text classification. Some existing works tackle it by proposing imbalanced loss objectives instead of the vanilla cross-entropy loss, but their performances remain limited in the cases of extremely imbalanced data. We propose a hybrid solution which adapts general networks for the head categories, and few-shot techniques for the tail categories. We propose a Hybrid-Siamese Convolutional Neural Network (HSCNN) with additional technical attributes, i.e., a multi-task architecture based on Single and Siamese networks; a category-specific similarity in the Siamese structure; a specific sampling method for training HSCNN. The results using two benchmark datasets and three loss objectives show that our method can improve the performance of Single networks with diverse loss objectives on the tail or entire categories.

DeepMet: A Reading Comprehension Paradigm for Token-level Metaphor Detection
Chuandong Su | Fumiyo Fukumoto | Xiaoxi Huang | Jiyi Li | Rongbo Wang | Zhiqun Chen
Proceedings of the Second Workshop on Figurative Language Processing

Machine metaphor understanding is one of the major topics in NLP. Most of the recent attempts consider it as classification or sequence tagging task. However, few types of research introduce the rich linguistic information into the field of computational metaphor by leveraging powerful pre-training language models. We focus a novel reading comprehension paradigm for solving the token-level metaphor detection task which provides an innovative type of solution for this task. We propose an end-to-end deep metaphor detection model named DeepMet based on this paradigm. The proposed approach encodes the global text context (whole sentence), local text context (sentence fragments), and question (query word) information as well as incorporating two types of part-of-speech (POS) features by making use of the advanced pre-training language model. The experimental results by using several metaphor datasets show that our model achieves competitive results in the second shared task on metaphor detection.


A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation
Jiyi Li | Fumiyo Fukumoto
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP

The target outputs of many NLP tasks are word sequences. To collect the data for training and evaluating models, the crowd is a cheaper and easier to access than the oracle. To ensure the quality of the crowdsourced data, people can assign multiple workers to one question and then aggregate the multiple answers with diverse quality into a golden one. How to aggregate multiple crowdsourced word sequences with diverse quality is a curious and challenging problem. People need a dataset for addressing this problem. We thus create a dataset (CrowdWSA2019) which contains the translated sentences generated from multiple workers. We provide three approaches as the baselines on the task of extractive word sequence aggregation. Specially, one of them is an original one we propose which models the reliability of workers. We also discuss some issues on ground truth creation of word sequences which can be addressed based on this dataset.

Text Categorization by Learning Predominant Sense of Words as Auxiliary Task
Kazuya Shimura | Jiyi Li | Fumiyo Fukumoto
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Distributions of the senses of words are often highly skewed and give a strong influence of the domain of a document. This paper follows the assumption and presents a method for text categorization by leveraging the predominant sense of words depending on the domain, i.e., domain-specific senses. The key idea is that the features learned from predominant senses are possible to discriminate the domain of the document and thus improve the overall performance of text categorization. We propose multi-task learning framework based on the neural network model, transformer, which trains a model to simultaneously categorize documents and predicts a predominant sense for each word. The experimental results using four benchmark datasets show that our method is comparable to the state-of-the-art categorization approach, especially our model works well for categorization of multi-label documents.


HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization
Kazuya Shimura | Jiyi Li | Fumiyo Fukumoto
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We focus on the multi-label categorization task for short texts and explore the use of a hierarchical structure (HS) of categories. In contrast to the existing work using non-hierarchical flat model, the method leverages the hierarchical relations between the pre-defined categories to tackle the data sparsity problem. The lower the HS level, the less the categorization performance. Because the number of training data per category in a lower level is much smaller than that in an upper level. We propose an approach which can effectively utilize the data in the upper levels to contribute the categorization in the lower levels by applying the Convolutional Neural Network (CNN) with a fine-tuning technique. The results using two benchmark datasets show that proposed method, Hierarchical Fine-Tuning based CNN (HFT-CNN) is competitive with the state-of-the-art CNN based methods.