WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating
Hyeong-seob Kim, Sang-Woo Son, Hyun-woo Cho, Hyeonsang Kim, Jinmo Kim
Abstract
Real-time speech translation with large language models (LLMs) has become feasible in controlled wideband settings—mobile apps, web browsers, and end-to-end full-duplex systems pushing latency below 200 ms—where developers can assume client-side echo cancellation. However, deploying such systems over the Public Switched Telephone Network (PSTN) remains challenging due to narrowband G.711 audio, unpredictable round-trip delays, and absence of client-side signal processing. We present **WIGVO** (WIGTN Voice-Only), a server-side relay system that enables bidirectional LLM-based speech translation over ordinary telephone calls without requiring app installation or carrier integration. A central contribution is addressing what we term *echo-induced self-reinforcing translation loops*: synthesized speech echoing back through the PSTN gets re-ingested and repeatedly translated. WIGVO solves this through a dual-session architecture with deterministic silence injection and energy-based voice activity detection (VAD) gating. We evaluate WIGVO on 155 Korean–English PSTN calls (148 instrumented, 147 completed) across three communication modes—voice-to-voice (V2V), text-to-voice (T2V), and full-agent—observing 555 ms median caller-to-callee latency and 2,684 ms median callee-to-caller latency, zero echo-induced translation loops, COMET semantic adequacy of 0.71 (en→ko) and 0.62 (ko→en) against offline LLM references, and USD 0.28 per minute cost. The system is deployed at https://wigvo.wigtn.com, with a video walkthrough at https://youtu.be/4Uf6zMPOInY. Evaluation scripts and anonymized call logs are available in the open-source repository.- Anthology ID:
- 2026.acl-demo.33
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Greg Durrett, Ping Jian
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 336–344
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-demo.33/
- DOI:
- Cite (ACL):
- Hyeong-seob Kim, Sang-Woo Son, Hyun-woo Cho, Hyeonsang Kim, and Jinmo Kim. 2026. WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 336–344, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating (Kim et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-demo.33.pdf