Gang Hu

2026

While Large Language Models (LLMs) excel in various general domains, they exhibit notable gaps in the highly specialized, knowledge-intensive, and legally regulated Chinese tax domain. Consequently, while tax-related benchmarks are gaining attention, many focus on isolated NLP tasks, neglecting real-world practical capabilities. To address this issue, we introduce TaxPraBen, the first dedicated benchmark for Chinese taxation practice. It combines 10 traditional application tasks, along with 3 pioneering real-world scenarios: tax risk prevention, tax inspection analysis, and tax strategy planning, sourced from 14 datasets totaling 7.3K instances. TaxPraBen features a scalable structured evaluation paradigm designed through process of "structured parsing—field alignment extraction—numerical and textual matching", enabling end-to-end tax practice assessment while being extensible to other domains. We evaluate 19 LLMs based on Bloom’s taxonomy. The results indicate significant performance disparities: all closed-source large-parameter LLMs excel, and Chinese LLMs like Qwen2.5 generally exceed multilingual LLMs, while the YaYi2 LLM, fine-tuned with some tax data, shows only limited improvement. TaxPraBen[<https://anonymous.4open.science/r/TaxPraBen/>] serves as a vital resource for advancing evaluations of LLMs in practical applications.

2025

pdf bib abs

Overview of CCL25-Eval Task 7: Chinese Literary Language Understanding Evaluation (ZhengMing)
Kang Wang | Qing Wang | Min Peng | Kun Yue | Gang Hu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"The 24th Chinese Computational Linguistics Conference (CCL25-Eval) features 12 technical evaluation tasks. Among them, Task 7 is the Chinese Literary Language Understanding Evaluation (ZhengMing). ZhengMing is a universal and scalable evaluation framework designed to assess natural language processing (NLP) tasks in the literary domain, such as text classification, text generation, automated question answering, relation extraction, and machine translation.ZhengMing framework aims to evaluate the performance of large language models (LLMs) in the literary field at a fine-grained level. In this mission, 89 teams signed up for the competition, with5 teams ultimately submitting results. The highest score achieved is 0.65. This paper presents and discusses the dataset, task descriptions, competition results, and other relevant information for this evaluation task. This paper introduces and presents relevant information about this evaluation task, including the dataset, task description, and competition results. More details are available at https://github.com/isShayulajiao/CCL25-Eval-ZhengMing."

2023

pdf bib abs

Exploring Prompt Engineering with GPT Language Models for Document-Level Machine Translation: Insights and Findings
Yangjian Wu | Gang Hu
Proceedings of the Eighth Conference on Machine Translation

This paper describes Lan-Bridge Translation systems for the WMT 2023 General Translation shared task. We participate in 2 directions: English to and from Chinese. With the emergence of large-scale models, various industries have undergone significant transformations, particularly in the realm of document-level machine translation. This has introduced a novel research paradigm that we have embraced in our participation in the WMT23 competition. Focusing on advancements in models such as GPT-3.5 and GPT-4, we have undertaken numerous prompt-based experiments. Our objective is to achieve optimal human evaluation results for document-level machine translation, resulting in our submission of the final outcomes in the general track.

2022

pdf bib abs

Lan-Bridge MT’s Participation in the WMT 2022 General Translation Shared Task
Bing Han | Yangjian Wu | Gang Hu | Qiulin Chen
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes Lan-Bridge Translation systems for the WMT 2022 General Translation shared task. We participate in 18 language directions: English to and from Czech, German, Ukrainian, Japanese, Russian, Chinese, English to Croatian, French to German, Yakut to and from Russian and Ukrainian to and from Czech.To develop systems covering all these direc_x0002_tions, we mainly focus on multilingual mod_x0002_els. In general, we apply data corpus filtering, scaling model size, sparse expert model (in par_x0002_ticular, Transformer with adapters), large scale backtranslation and language model rerankingtechniques. Our system ranks first in 6 directions based on automatic evaluation.

Co-authors

Venues

Fix author