Qianying Liu


2022

pdf
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection
Sheng Li | Jiyi Li | Qianying Liu | Zhuo Gong
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the advent of the General Data Protection Regulation (GDPR) and increasing privacy concerns, the sharing of speech data is faced with significant challenges. Protecting the sensitive content of speech is the same important as the voiceprint. This paper proposes an effective speech content protection method by constructing a frame-by-frame adversarial speech generation system. We revisited the adversarial examples generating method in the recent machine learning field and selected the phonetic state sequence of sensitive speech for the adversarial examples generation. We build an adversarial speech collection. Moreover, based on the speech collection, we proposed a neural network-based frame-by-frame mapping method to recover the speech content by converting from the adversarial speech to the human speech. Experiment shows our proposed method can encode and recover any sensitive audio, and our method is easy to be conducted with publicly available resources of speech recognition technology.

pdf
Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems
Yibin Shen | Qianying Liu | Zhuoyuan Mao | Zhen Wan | Fei Cheng | Sadao Kurohashi
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

To solve Math Word Problems, human students leverage diverse reasoning logic that reaches different possible equation solutions. However, the mainstream sequence-to-sequence approach of automatic solvers aims to decode a fixed solution equation supervised by human annotation. In this paper, we propose a controlled equation generation solver by leveraging a set of control codes to guide the model to consider certain reasoning logic and decode the corresponding equations expressions transformed from the human reference. The empirical results suggest that our method universally improves the performance on single-unknown (Math23K) and multiple-unknown (DRAW1K, HMWP) benchmarks, with substantial improvements up to 13.2% accuracy on the challenging multiple-unknown datasets.

2020

pdf
A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

pdf
Refining Data for Text Generation
Wenyu Guan | Qianying Liu | Tianyi Li | Sujian Li
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Recent work on data-to-text generation has made progress under the neural encoder-decoder architectures. However, the data input size is often enormous, while not all data records are important for text generation and inappropriate input may bring noise into the final output. To solve this problem, we propose a two-step approach which first selects and orders the important data records and then generates text from the noise-reduced data. Here we propose a learning to rank model to rank the importance of each record which is supervised by a relation extractor. With the noise-reduced data as input, we implement a text generator which sequentially models the input data records and emits a summary. Experiments on the ROTOWIRE dataset verifies the effectiveness of our proposed method in both performance and efficiency.

pdf
Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation Extraction
Ranran Haoran Zhang | Qianying Liu | Aysa Xuemo Fan | Heng Ji | Daojian Zeng | Fei Cheng | Daisuke Kawahara | Sadao Kurohashi
Findings of the Association for Computational Linguistics: EMNLP 2020

Joint entity and relation extraction aims to extract relation triplets from plain text directly. Prior work leverages Sequence-to-Sequence (Seq2Seq) models for triplet sequence generation. However, Seq2Seq enforces an unnecessary order on the unordered triplets and involves a large decoding length associated with error accumulation. These methods introduce exposure bias, which may cause the models overfit to the frequent label combination, thus limiting the generalization ability. We propose a novel Sequence-to-Unordered-Multi-Tree (Seq2UMTree) model to minimize the effects of exposure bias by limiting the decoding length to three within a triplet and removing the order among triplets. We evaluate our model on two datasets, DuIE and NYT, and systematically study how exposure bias alters the performance of Seq2Seq models. Experiments show that the state-of-the-art Seq2Seq model overfits to both datasets while Seq2UMTree shows significantly better generalization. Our code is available at https://github.com/WindChimeRan/OpenJERE.

2019

pdf
Tree-structured Decoding for Solving Math Word Problems
Qianying Liu | Wenyv Guan | Sujian Li | Daisuke Kawahara
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Automatically solving math word problems is an interesting research topic that needs to bridge natural language descriptions and formal math equations. Previous studies introduced end-to-end neural network methods, but these approaches did not efficiently consider an important characteristic of the equation, i.e., an abstract syntax tree. To address this problem, we propose a tree-structured decoding method that generates the abstract syntax tree of the equation in a top-down manner. In addition, our approach can automatically stop during decoding without a redundant stop token. The experimental results show that our method achieves single model state-of-the-art performance on Math23K, which is the largest dataset on this task.

pdf
An Improved Coarse-to-Fine Method for Solving Generation Tasks
Wenyv Guan | Qianying Liu | Guangzhi Han | Bin Wang | Sujian Li
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

The coarse-to-fine (coarse2fine) methods have recently been widely used in the generation tasks. The methods first generate a rough sketch in the coarse stage and then use the sketch to get the final result in the fine stage. However, they usually lack the correction ability when getting a wrong sketch. To solve this problem, in this paper, we propose an improved coarse2fine model with a control mechanism, with which our method can control the influence of the sketch on the final results in the fine stage. Even if the sketch is wrong, our model still has the opportunity to get a correct result. We have experimented our model on the tasks of semantic parsing and math word problem solving. The results have shown the effectiveness of our proposed model.