Brendon Boldt
2026
Communicating in Emergent Language with an Induced Morphological Phrasebook
Brendon Boldt | David R. Mortensen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Brendon Boldt | David R. Mortensen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We build rule-based emergent language (EL) agents using form-meaning mappings induced from ELs ("morphological phrasebooks") and test their communicative performance in the EL environment with its neural network agents. This contributes three things: First, it assesses the quality of the morphemes discovered by the induction algorithm in situ, which we find to be effective for communicating in the EL. Second, it allows us to uncover morphosyntactic properties of EL through ablating the algorithms which induce and utilize morphemes, showing that the ELs rely on repetition as well as morpheme ordering to convey meaning. Third, we find that the normalized pointwise mutual information of forms and meanings in the morphemes serves as a metric of compositionality that is more closely correlated with the ability of the phrasebook-agents to "speak" and "hear" an EL than existing metrics such as topographic similarity.
PRiSM: Benchmarking Phone Realization in Speech Models
Shikhar Bharadwaj | Chin-Jou Li | Yoonjae Kim | Kwanghee Choi | Eunjung Yeo | Ryan Soh-Eun Shim | Hanyu Zhou | Brendon Boldt | Karen Rosero | Kalvin Chang | Darsh Agrawal | Keer Xu | Chao-Han Huck Yang | Jian Zhu | Shinji Watanabe | David R. Mortensen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shikhar Bharadwaj | Chin-Jou Li | Yoonjae Kim | Kwanghee Choi | Eunjung Yeo | Ryan Soh-Eun Shim | Hanyu Zhou | Brendon Boldt | Karen Rosero | Kalvin Chang | Darsh Agrawal | Keer Xu | Chao-Han Huck Yang | Jian Zhu | Shinji Watanabe | David R. Mortensen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Phone recognition (PR) serves as the atomic interface for language-agnostic modeling for cross-lingual speech processing and phonetic analysis. Despite prolonged efforts in developing PR systems, current evaluations only measure surface-level transcription accuracy. We introduce PRiSM, the first open-source benchmark designed to expose blind spots in phonetic perception through intrinsic and extrinsic evaluation of PR systems. PRiSM standardizes transcription-based evaluation and assesses downstream utility in clinical, educational, and multilingual settings with transcription and representation probes. We find that diverse language exposure during training is key to PR performance, encoder-CTC models are the most stable, and specialized PR systems still outperform LALMs. PRiSM releases code, recipes, and datasets to move the field toward multilingual speech models with robust phonetic ability.
2025
Morpheme Induction for Emergent Language
Brendon Boldt | David R. Mortensen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Brendon Boldt | David R. Mortensen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We introduce CSAR, an algorithm for inducing morphemes from emergent language corpora of parallel utterances and meanings.It is a greedy algorithm that (1) weights morphemes based on mutual information between forms and meanings, (2) selects the highest-weighted pair, (3) removes it from the corpus, and (4) repeats the process to induce further morphemes (i.e., Count, Select, Ablate, Repeat).The effectiveness of CSAR is first validated on procedurally generated datasets and compared against baselines for related tasks.Second, we validate CSAR’s performance on human language data to show that the algorithm makes reasonable predictions in adjacent domains.Finally, we analyze a handful of emergent languages, quantifying linguistic characteristics like degree of synonymy and polysemy.
Searching for the Most Human-like Emergent Language
Brendon Boldt | David R. Mortensen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Brendon Boldt | David R. Mortensen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In this paper, we design a signalling game-based emergent communication environment to generate state-of-the-art emergent languages in terms of similarity to human language. This is done with hyperparameter optimization, using XferBench as the objective function. XferBench quantifies the statistical similarity of emergent language to human language by measuring its suitability for deep transfer learning to human language. Additionally, we demonstrate the predictive power of entropy on the transfer learning performance of emergent language as well as corroborate previous results on the entropy-minimization properties of emergent communication systems. Finally, we report generalizations regarding what hyperparameters produce more realistic emergent languages, that is, ones which transfer better to human language.
2024
XferBench: a Data-Driven Benchmark for Emergent Language
Brendon Boldt | David Mortensen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Brendon Boldt | David Mortensen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
In this paper, we introduce a benchmark for evaluating the overall quality of emergent languages using data-driven methods. Specifically, we interpret the notion of the “quality” of an emergent language as its similarity to human language within a deep learning framework. We measure this by using the emergent language as pretraining data for a downstream NLP tasks in human language—the better the downstream performance, the better the emergent language. We implement this benchmark as an easy-to-use Python package that only requires a text file of utterances from the emergent language to be evaluated. Finally, we empirically test the benchmark’s validity using human, synthetic, and emergent language baselines.
2021
Case Study: Deontological Ethics in NLP
Shrimai Prabhumoye | Brendon Boldt | Ruslan Salakhutdinov | Alan W Black
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Shrimai Prabhumoye | Brendon Boldt | Ruslan Salakhutdinov | Alan W Black
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices. However, there has been little discussion about the ethical foundations that underlie these efforts. In this work, we study one ethical theory, namely deontological ethics, from the perspective of NLP. In particular, we focus on the generalization principle and the respect for autonomy through informed consent. We provide four case studies to demonstrate how these principles can be used with NLP systems. We also recommend directions to avoid the ethical issues in these systems.