ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and Pitches

Junyu Cheng, Chang Pan, Shuangyin Li


Abstract
Lyrics generation has garnered increasing attention within the artificial intelligence community. Our task focuses on generating harmonious Cantonese lyrics. Unlike other languages, Cantonese has a unique system of nine contours and six tones, making it essential to satisfy the harmony rules that ensure the alignment between the melody and the tonal contours of the lyrics when composing lyrics. Current research has not yet addressed the challenge of generating lyrics that adhere to Cantonese harmony rules. To tackle this issue, we propose ToneCraft, a novel framework for generating Cantonese lyrics that ensures tonal and melodic harmony. It enables LLMs to generate lyrics with a fixed character count while aligning with tonal and melodic structures. We present an algorithm that combines character-level control, melodic guidance, and a task-specific loss to achieve tonal harmony without compromising generation flexibility and quality. By incorporating domain-specific expertise, we leverage pure lyric datasets to train our model, eliminating the need for aligned data. Both objective evaluations and subjective assessments show that our generated lyrics align with melodic contours significantly better than existing methods. All code and data are available at: https://github.com/purepasser-by/ToneCraft.
Anthology ID:
2025.emnlp-main.18
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
335–353
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.18/
DOI:
Bibkey:
Cite (ACL):
Junyu Cheng, Chang Pan, and Shuangyin Li. 2025. ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and Pitches. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 335–353, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and Pitches (Cheng et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.18.pdf
Checklist:
 2025.emnlp-main.18.checklist.pdf