2024
pdf
bib
abs
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
Yan Zhao
|
Jiangyan Yi
|
Jianhua Tao
|
Chenglong Wang
|
Yongfeng Dong
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“To enhance the effectiveness of fake audio detection techniques, researchers have developed mul-tiple datasets such as those for the ASVspoof and ADD challenges. These datasets typically focuson capturing non-emotional characteristics in speech, such as the identity of the speaker and theauthenticity of the content. However, they often overlook changes in the emotional state of theaudio, which is another crucial dimension affecting the authenticity of speech. Therefore, thisstudy reports our progress in developing such an emotion fake audio detection dataset involvingchanging emotion state of the origin audio named EmoFake. The audio samples in EmoFake aregenerated using open-source emotional voice conversion models, intended to simulate potentialemotional tampering scenarios in real-world settings. We conducted a series of benchmark ex-periments on this dataset, and the results show that even advanced fake audio detection modelstrained on the ASVspoof 2019 LA dataset and the ADD 2022 track 3.2 dataset face challengeswith EmoFake. The EmoFake is publicly available1 now.”
2018
pdf
bib
abs
BLCU_NLP at SemEval-2018 Task 12: An Ensemble Model for Argument Reasoning Based on Hierarchical Attention
Meiqian Zhao
|
Chunhua Liu
|
Lu Liu
|
Yan Zhao
|
Dong Yu
Proceedings of the 12th International Workshop on Semantic Evaluation
To comprehend an argument and fill the gap between claims and reasons, it is vital to find the implicit supporting warrants behind. In this paper, we propose a hierarchical attention model to identify the right warrant which explains why the reason stands for the claim. Our model focuses not only on the similar part between warrants and other information but also on the contradictory part between two opposing warrants. In addition, we use the ensemble method for different models. Our model achieves an accuracy of 61%, ranking second in this task. Experimental results demonstrate that our model is effective to make correct choices.
2010
pdf
bib
abs
POS Multi-tagging Based on Combined Models
Yan Zhao
|
Gertjan van Noord
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In the POS tagging task, there are two kinds of statistical models: one is generative model, such as the HMM, the others are discriminative models, such as the Maximum Entropy Model (MEM). POS multi-tagging decoding method includes the N-best paths method and forward-backward method. In this paper, we use the forward-backward decoding method based on a combined model of HMM and MEM. If P(t) is the forward-backward probability of each possible tag t, we first calculate P(t) according HMM and MEM separately. For all tags options in a certain position in a sentence, we normalize P(t) in HMM and MEM separately. Probability of the combined model is the sum of normalized forward-backward probabilities P norm(t) in HMM and MEM. For each word w, we select the best tag in which the probability of combined model is the highest. In the experiments, we use combined model and get higher accuracy than any single model on POS tagging tasks of three languages, which are Chinese, English and Dutch. The result indicates that our combined model is effective.
2007
pdf
bib
Emotional Recognition Using a Compensation Transformation in Speech Signal
Cairong Zou
|
Yan Zhao
|
Li Zhao
|
Wenming Zhen
|
Yongqiang Bao
International Journal of Computational Linguistics & Chinese Language Processing, Volume 12, Number 1, March 2007: Special Issue on Affective Speech Processing