Hongji Li
Also published as: 济洪 李
2025
Sentence Smith: Controllable Edits for Evaluating Text Embeddings
Hongji Li
|
Andrianos Michail
|
Reto Gubelmann
|
Simon Clematide
|
Juri Opitz
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Controllable and transparent text generation has been a long-standing goal in NLP. Almost as long-standing is a general idea for addressing this challenge: Parsing text to a symbolic representation, and generating from it. However, earlier approaches were hindered by parsing and generation insufficiencies. Using modern parsers and a safety supervision mechanism, we show how close current methods come to this goal. Concretely, we propose the framework for English, which has three steps: 1. Parsing a sentence into a semantic graph. 2. Applying human-designed semantic manipulation rules. 3. Generating text from the manipulated graph. A final entailment check (4.) verifies the validity of the applied transformation. To demonstrate our framework’s utility, we use it to induce hard negative text pairs that challenge text embedding models. Since the controllable generation makes it possible to clearly isolate different types of semantic shifts, we can evaluate text embedding models in a fine-grained way, also addressing an issue in current benchmarking where linguistic phenomena remain opaque. Human validation confirms that our transparent generation process produces texts of good quality. Notably, our way of generation is very resource-efficient, since it relies only on smaller neural networks.
2023
基于BiLSTM聚合模型的汉语框架语义角色识别(Chinese Frame Semantic Role Identification Based on BiLSTM Aggregation Model)
Xuefei Cao (曹学飞)
|
Hongji Li (李济洪)
|
Ruibo Wang (王瑞波)
|
Qian Niu (牛倩)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“目前,基于神经网络的汉语框架语义角色识别模型的性能依然较低,考虑到神经网络模型的性能受到超参数的影响,本文将超参数调优和模型预测性能的提升统一到基于BiLSTM的聚合模型框架下解决。使用正则化交叉验证进行实验,通过正则化条件约束训练集和验证集的分布差异,避免分布不一致带来的性能波动。将交叉验证得到的结果进行众数投票,以投票后的结果对不同的超参数配置进行评估,并选择若干种没有显著差异的超参数配置构成最优的超参数配置集合。然后将最优的超参数配置集合对应的子模型进行聚合,构造汉语框架语义角色识别的聚合模型。实验结果显示,本文方法的性能较基准模型显著提升了9.56%。”
Search
Fix author
Co-authors
- Xuefei Cao (曹学飞) 1
- Simon Clematide 1
- Reto Gubelmann 1
- Andrianos Michail 1
- Qian Niu (牛倩) 1
- show all...