Ke Sun


2023

pdf
Gloss-Free End-to-End Sign Language Translation
Kezhou Lin | Xiaohan Wang | Linchao Zhu | Ke Sun | Bang Zhang | Yi Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations. Although intermediate representation like gloss has been proven effective, gloss annotations are hard to acquire, especially in large quantities. This limits the domain coverage of translation datasets, thus handicapping real-world applications. To mitigate this problem, we design the Gloss-Free End-to-end sign language translation framework (GloFE). Our method improves the performance of SLT in the gloss-free setting by exploiting the shared underlying semantics of signs and the corresponding spoken translation. Common concepts are extracted from the text and used as a weak form of intermediate representation. The global embedding of these concepts is used as a query for cross-attention to find the corresponding information within the learned visual features. In a contrastive manner, we encourage the similarity of query results between samples containing such concepts and decrease those that do not. We obtained state-of-the-art results on large-scale datasets, including OpenASL and How2Sign.

2020

pdf
DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset
Lijie Wang | Ao Zhang | Kun Wu | Ke Sun | Zhenghua Li | Hua Wu | Min Zhang | Haifeng Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Due to the lack of labeled data, previous research on text-to-SQL parsing mainly focuses on English. Representative English datasets include ATIS, WikiSQL, Spider, etc. This paper presents DuSQL, a larges-scale and pragmatic Chinese dataset for the cross-domain text-to-SQL task, containing 200 databases, 813 tables, and 23,797 question/SQL pairs. Our new dataset has three major characteristics. First, by manually analyzing questions from several representative applications, we try to figure out the true distribution of SQL queries in real-life needs. Second, DuSQL contains a considerable proportion of SQL queries involving row or column calculations, motivated by our analysis on the SQL query distributions. Finally, we adopt an effective data construction framework via human-computer collaboration. The basic idea is automatically generating SQL queries based on the SQL grammar and constrained by the given database. This paper describes in detail the construction process and data statistics of DuSQL. Moreover, we present and compare performance of several open-source text-to-SQL parsers with minor modification to accommodate Chinese, including a simple yet effective extension to IRNet for handling calculation SQL queries.

2013

pdf
A Hierarchical Semantics-Aware Distributional Similarity Scheme
Shuqi Sun | Ke Sun | Shiqi Zhao | Haifeng Wang | Muyun Yang | Sheng Li
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2008

pdf
A Study of Chinese Lexical Analysis Based on Discriminative Models
Guang-Lu Sun | Cheng-Jie Sun | Ke Sun | Xiao-Long Wang
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing