Xingyuan Li
2025
Toward Automatic Discovery of a Canine Phonetic Alphabet
Theron S. Wang
|
Xingyuan Li
|
Hridayesh Lekhak
|
Tuan Minh Dang
|
Mengyue Wu
|
Kenny Q. Zhu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dogs communicate intelligently but little is known about the phonetic properties of their vocalization communication. For the first time, this paper presents an iterative algorithm inspired by human phonetic discovery, which is based on minimal pairs that determine phonemes by distinguishing different words in human language, and is able to produce a complete alphabet of distinct canine phoneme-like units. In addition, the algorithm produces a number of canine repeated acoustic units, which may correspond to specific environments and activities of a dog, composed exclusively of the canine phoneme-like units in the alphabet. The framework outlined in this paper is expected to function not only on canines but other animal species.
2024
Phonetic and Lexical Discovery of Canine Vocalization
Theron S. Wang
|
Xingyuan Li
|
Chunhao Zhang
|
Mengyue Wu
|
Kenny Q. Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
This paper attempts to discover communication patterns automatically within dog vocalizations in a data-driven approach, which breaks the barrier previous approaches that rely on human prior knowledge on limited data. We present a self-supervised approach with HuBERT, enabling the accurate classification of phones, and an adaptive grammar induction method that identifies phone sequence patterns that suggest a preliminary vocabulary within dog vocalizations. Our results show that a subset of this vocabulary has substantial causality relations with certain canine activities, suggesting signs of stable semantics associated with these “words”.
2020
CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization
Lei Li
|
Yang Xie
|
Wei Liu
|
Yinan Liu
|
Yafei Jiang
|
Siya Qi
|
Xingyuan Li
Proceedings of the First Workshop on Scholarly Document Processing
Our system participates in two shared tasks, CL-SciSumm 2020 and LongSumm 2020. In the CL-SciSumm shared task, based on our previous work, we apply more machine learning methods on position features and content features for facet classification in Task1B. And GCN is introduced in Task2 to perform extractive summarization. In the LongSumm shared task, we integrate both the extractive and abstractive summarization ways. Three methods were tested which are T5 Fine-tuning, DPPs Sampling, and GRU-GCN/GAT.
Search
Fix author
Co-authors
- Theron S. Wang 2
- Mengyue Wu 2
- Kenny Zhu 2
- Tuan Minh Dang 1
- Yafei Jiang 1
- show all...