Ziyuan Wang
2026
Towards Trustworthy Smart Contract Synthesis: A Multi-Agent Framework with Lean-Based Verification
Bowei Zhang | Hanbing Liu | Qixin Tian | Siyu Chen | Ziyuan Wang | Qi Qi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bowei Zhang | Hanbing Liu | Qixin Tian | Siyu Chen | Ziyuan Wang | Qi Qi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Smart Contracts are the foundation of Decentralized Finance (DeFi), executing financial logic without trusted intermediaries. Recent advances in large language models (LLMs) have substantially lowered the barrier to smart contract development by enabling code generation from natural language. However, because smart contracts are immutable and directly manage financial assets, this accessibility introduces a critical trust gap: generated contracts are easy to produce but hard to trust. To bridge this gap, we present LeVer, the first trustworthy smart contract synthesis framework that integrates LLM-based generation with Lean-based auto-formalization and Verification. LeVer employs a closed-loop multi-agent architecture to iteratively generate, verify, attack, and repair contracts, providing both formal guarantees and empirical robustness. To facilitate the adoption of automated formal verification in smart contract generation and audition, we open-source our framework and datasets at: https://github.com/gl-bowei/LeVer
2025
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang | Zheshu Song | Jianheng Zhuo | Mingyu Cui | Jinpeng Li | Bo Yang | Yexing Du | Ziyang Ma | Xunying Liu | Ziyuan Wang | Ke Li | Shuai Fan | Kai Yu | Wei-Qiang Zhang | Guoguo Chen | Xie Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yifan Yang | Zheshu Song | Jianheng Zhuo | Mingyu Cui | Jinpeng Li | Bo Yang | Yexing Du | Ziyang Ma | Xunying Liu | Ziyuan Wang | Ke Li | Shuai Fan | Kai Yu | Wei-Qiang Zhang | Guoguo Chen | Xie Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired speech and text data. GigaSpeech 2 comprises about 30,000 hours of automatically transcribed speech, including Thai, Indonesian, and Vietnamese, gathered from unlabeled YouTube videos. We also introduce an automated pipeline for data crawling, transcription, and label refinement. Specifically, this pipeline involves Whisper for initial transcription, MMS for forced alignment, and multi-dimensional filtering for data quality assurance. A modified Noisy Student Training is developed to further refine flawed pseudo labels iteratively, thereby enhancing model performance. Experimental results on our manually transcribed evaluation set and two public test sets from Common Voice and FLEURS confirm our corpus’s high quality and broad applicability. Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to Whisper large-v3, with merely 10% model parameters. Furthermore, our ASR models trained on GigaSpeech 2 yield superior performance compared to commercial services. We hope that our newly introduced corpus and pipeline will open a new avenue for low-resource speech recognition and significantly facilitate research in this area.
2012
The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task
Radu Soricut | Nguyen Bach | Ziyuan Wang
Proceedings of the Seventh Workshop on Statistical Machine Translation
Radu Soricut | Nguyen Bach | Ziyuan Wang
Proceedings of the Seventh Workshop on Statistical Machine Translation
2011
Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
Zhifei Li | Ziyuan Wang | Jason Eisner | Sanjeev Khudanpur | Brian Roark
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
Zhifei Li | Ziyuan Wang | Jason Eisner | Sanjeev Khudanpur | Brian Roark
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
2010
Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies
Zhifei Li | Chris Callison-Burch | Chris Dyer | Juri Ganitkevitch | Ann Irvine | Sanjeev Khudanpur | Lane Schwartz | Wren Thornton | Ziyuan Wang | Jonathan Weese | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Zhifei Li | Chris Callison-Burch | Chris Dyer | Juri Ganitkevitch | Ann Irvine | Sanjeev Khudanpur | Lane Schwartz | Wren Thornton | Ziyuan Wang | Jonathan Weese | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Search
Fix author
Co-authors
- Sanjeev Khudanpur 3
- Zhifei Li 3
- Jason Eisner 2
- Nguyen Bach 1
- Chris Callison-Burch 1
- Guoguo Chen 1
- Xie Chen 1
- Siyu Chen 1
- Mingyu Cui 1
- Yexing Du 1
- Chris Dyer 1
- Shuai Fan 1
- Juri Ganitkevitch 1
- Ann Irvine 1
- Jinpeng Li 1
- Ke Li 1
- Xunying Liu 1
- Hanbing Liu 1
- Ziyang Ma 1
- Qi Qi 1
- Brian Roark 1
- Lane Schwartz 1
- Zheshu Song 1
- Radu Soricut 1
- Wren Thornton 1
- Qixin Tian 1
- Jonathan Weese 1
- Yifan Yang 1
- Bo Yang 1
- Kai Yu 1
- Omar Zaidan 1
- Wei-Qiang Zhang 1
- Bowei Zhang 1
- Jianheng Zhuo 1