Bowen Chen


2025

pdf bib
A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models
Bowen Chen | Namgi Han | Yusuke Miyao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The lack of data transparency in Large Language Models (LLMs) has highlighted the importance of Membership Inference Attack (MIA), which differentiates trained (member) and untrained (non-member) data. Though it shows success in previous studies, recent research reported a near-random performance in different settings, highlighting a significant performance inconsistency. We assume that a single setting doesn’t represent the distribution of the vast corpora, causing members and non-members with different distributions to be sampled and causing inconsistency. In this study, instead of a single setting, we statistically revisit MIA methods from various settings with thousands of experiments for each MIA method, along with study in text feature, embedding, threshold decision, and decoding dynamics of members and non-members. We found that (1) MIA performance improves with model size and varies with domains, while most methods do not statistically outperform baselines, (2) Though MIA performance is generally low, a notable amount of differentiable member and non-member outliers exists and vary across MIA methods, (3) Deciding a threshold to separate members and non-members is an overlooked challenge, (4) Text dissimilarity and long text benefit MIA performance, (5) Differentiable or not is reflected in the LLM embedding, (6) Member and non-members show different decoding dynamics.

pdf bib
OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent
Bowen Chen | Zhao Wang | Shingo Takamatsu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Keyword decision in Sponsored Search Advertising is critical to the success of ad campaigns. While LLM-based methods offer automated keyword generation, they face three major limitations: reliance on large-scale query–keyword pair data, lack of online multi-objective performance monitoring and optimization, and weak quality control in keyword selection. These issues hinder the agentic use of LLMs in fully automating keyword decisions by monitoring and reasoning over key performance indicators such as impressions, clicks, conversions, and CTA effectiveness. To overcome these challenges, we propose OMS, a keyword generation framework that is On-the-fly (requires no training data, monitors online performance, and adapts accordingly), Multi-objective (employs agentic reasoning to optimize keywords based on multiple performance metrics) and Self-reflective (agentically evaluates keyword quality). Experiments on benchmarks and real-world ad campaigns show that OMS outperforms existing methods; Ablation and human evaluations confirm the effectiveness of each component and the quality of generated keywords.

pdf bib
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
Zhao Wang | Bowen Chen | Yotaro Shimose | Sota Moriyama | Heng Wang | Shingo Takamatsu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Recent generative models such as GPT‐4o have shown strong capabilities in producing high-quality images with accurate text rendering. However, commercial design tasks like advertising banners demand more than visual fidelity—they require structured layouts, precise typography, consistent branding and etc. In this paper, we introduce **MIMO (Mirror In‐the‐Model)**, an agentic refinement framework for automatic ad banner generation. MIMO combines a hierarchical multimodal agent system (MIMO‐Core) with a coordination loop (MIMO‐Loop) that explores multiple stylistic directions and iteratively improves design quality. Requiring only a simple natural language based prompt and logo image as input, MIMO automatically detects and corrects multiple types of errors during generation. Experiments show that MIMO significantly outperforms existing diffusion and LLM-based baselines in real-world banner design scenarios.

2024

pdf bib
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen | Namgi Han | Yusuke Miyao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) can generate the same sequences contained in the pre-train corpora, known as memorization.Previous research studied it at a macro level, leaving micro yet important questions under-explored, e.g., what makes sentences memorized, the dynamics when generating memorized sequence, its connection to unmemorized sequence, and its predictability.We answer the above questions by analyzing the relationship of memorization with outputs from LLM, namely, embeddings, probability distributions, and generated tokens.A memorization score is calculated as the overlap between generated tokens and actual continuations when the LLM is prompted with a context sequence from the pre-train corpora.Our findings reveal:(1) The inter-correlation between memorized/unmemorized sentences, model size, continuation size, and context size, as well as the transition dynamics between sentences of different memorization scores,(2) A sudden drop and increase in the frequency of input tokens when generating memorized/unmemorized sequences (boundary effect),(3) Cluster of sentences with different memorization scores in the embedding space,(4) An inverse boundary effect in the entropy of probability distributions for generated memorized/unmemorized sequences,(5) The predictability of memorization is related to model size and continuation length. In addition, we show a Transformer model trained by the hidden states of LLM can predict unmemorized tokens.

2022

pdf bib
CogBERT: Cognition-Guided Pre-trained Language Models
Xiao Ding | Bowen Chen | Li Du | Bing Qin | Ting Liu
Proceedings of the 29th International Conference on Computational Linguistics

We study the problem of integrating cognitive language processing signals (e.g., eye-tracking or EEG data) into pre-trained language models like BERT. Existing methods typically fine-tune pre-trained models on cognitive data, ignoring the semantic gap between the texts and cognitive signals. To fill the gap, we propose CogBERT, a framework that can induce fine-grained cognitive features from cognitive data and incorporate cognitive features into BERT by adaptively adjusting the weight of cognitive features for different NLP tasks. Extensive experiments show that: (1) Cognition-guided pre-trained models can consistently perform better than basic pre-trained models on ten NLP tasks. (2) Different cognitive features contribute differently to different NLP tasks. Based on this observation, we give a fine-grained explanation of why cognitive data is helpful for NLP. (3) Different transformer layers of pre-trained models should encode different cognitive features, with word-level cognitive features at the bottom and semantic-level cognitive features at the top. (4) Attention visualization demonstrates that CogBERT aligns with human gaze patterns and improves its natural language comprehension ability.

pdf bib
Syntactic and Semantic Uniformity for Semantic Parsing and Task-Oriented Dialogue Systems
Bowen Chen | Yusuke Miyao
Findings of the Association for Computational Linguistics: EMNLP 2022

This paper proposes a data representation framework for semantic parsing and task-oriented dialogue systems, aiming to achieve a uniform representation for syntactically and semantically diverse machine-readable formats.Current NLP systems heavily rely on adapting pre-trained language models to specific tasks, and this approach has been proven effective for modeling natural language texts.However, little attention has been paid to the representation of machine-readable formats, such as database queries and dialogue states.We present a method for converting original machine-readable formats of semantic parsing and task-oriented dialogue datasets into a syntactically and semantically uniform representation.We define a meta grammar for syntactically uniform representations and translate semantically equivalent functions into a uniform vocabulary.Empirical experiments on 13 datasets show that accuracy consistently improves over original formats, revealing the advantage of the proposed representation.Additionally, we show that the proposed representation allows for transfer learning across datasets.