Chris Parnin


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
TeCoFeS: Text Column Featurization using Semantic Analysis
Ananya Singha | Mukul Singh | Ashish Tiwari | Sumit Gulwani | Vu Le | Chris Parnin
Findings of the Association for Computational Linguistics: NAACL 2025

Extracting insights from text columns can bechallenging and time-intensive. Existing methods for topic modeling and feature extractionare based on syntactic features and often overlook the semantics. We introduce the semantictext column featurization problem, and presenta scalable approach for automatically solvingit. We extract a small sample smartly, use alarge language model (LLM) to label only thesample, and then lift the labeling to the wholecolumn using text embeddings. We evaluateour approach by turning existing text classification benchmarks into semantic categorization benchmarks. Our approach performs better than baselines and naive use of LLMs.