Judita Preiss


2024

pdf
Incorporating Word Count Information into Depression Risk Summary Generation: INF@UoS CLPsych 2024 Submission
Judita Preiss | Zenan Chen
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

Large language model classifiers do not directly offer transparency: it is not clear why one class is chosen over another. In this work, summaries explaining the suicide risk level assigned using a fine-tuned mental-roberta-base model are generated from key phrases extracted using SHAP explainability using Mistral-7B. The training data for the classifier consists of all Reddit posts of a user in the University of Maryland Reddit Suicidality Dataset, Version 2, with their suicide risk labels along with selected features extracted from each post by the Linguistic Inquiry and Word Count (LIWC-22) tool. The resulting model is used to make predictions regarding risk on each post of the users in the evaluation set of the CLPsych 2024 shared task, with a SHAP explainer used to identify the phrases contributing to the top scoring, correct and severe risk categories. Some basic stoplisting is applied to the extracted phrases, along with length based filtering, and a locally run version of Mistral-7B-Instruct-v0.1 is used to create summaries from the highest value (based on SHAP) phrases.

2023

pdf
Automatic Named Entity Obfuscation in Speech
Judita Preiss
Findings of the Association for Computational Linguistics: ACL 2023

Sharing data containing personal information often requires its anonymization, even when consent for sharing was obtained from the data originator. While approaches exist for automated anonymization of text, the area is not as thoroughly explored in speech. This work focuses on identifying, replacing and inserting replacement named entities synthesized using voice cloning into original audio thereby retaining prosodic information while reducing the likelihood of deanonymization. The approach employs a novel named entity recognition (NER) system built directly on speech by training HuBERT (Hsu et al, 2021) using the English speech NER dataset (Yadav et al, 2020). Name substitutes are found using a masked language model and are synthesized using text to speech voice cloning (Eren and team, 2021), upon which the substitute named entities are re-inserted into the original text. The approach is prototyped on a sample of the LibriSpeech corpus (Panyatov et al, 2015) with each step evaluated individually.

2021

pdf
Predicting Informativeness of Semantic Triples
Judita Preiss
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Many automatic semantic relation extraction tools extract subject-predicate-object triples from unstructured text. However, a large quantity of these triples merely represent background knowledge. We explore using full texts of biomedical publications to create a training corpus of informative and important semantic triples based on the notion that the main contributions of an article are summarized in its abstract. This corpus is used to train a deep learning classifier to identify important triples, and we suggest that an importance ranking for semantic triples could also be generated.

2018

pdf
HiDE: a Tool for Unrestricted Literature Based Discovery
Judita Preiss | Mark Stevenson
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

As the quantity of publications increases daily, researchers are forced to narrow their attention to their own specialism and are therefore less likely to make new connections with other areas. Literature based discovery (LBD) supports the identification of such connections. A number of LBD tools are available, however, they often suffer from limitations such as constraining possible searches or not producing results in real-time. We introduce HiDE (Hidden Discovery Explorer), an online knowledge browsing tool which allows fast access to hidden knowledge generated from all abstracts in Medline. HiDE is fast enough to allow users to explore the full range of hidden connections generated by an LBD system. The tool employs a novel combination of two approaches to LBD: a graph-based approach which allows hidden knowledge to be generated on a large scale and an inference algorithm to identify the most promising (most likely to be non trivial) information. Available at https://skye.shef.ac.uk/kdisc

2014

pdf
Seeking Informativeness in Literature Based Discovery
Judita Preiss
Proceedings of BioNLP 2014

2013

pdf
Unsupervised Domain Tuning to Improve Word Sense Disambiguation
Judita Preiss | Mark Stevenson
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples
Judita Preiss | Mark Stevenson
Proceedings of the 2013 NAACL HLT Demonstration Session

pdf
Distinguishing Common and Proper Nouns
Judita Preiss | Mark Stevenson
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf
Identifying Comparable Corpora Using LDA
Judita Preiss
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
University_Of_Sheffield: Two Approaches to Semantic Text Similarity
Sam Biggins | Shaabi Mohammed | Sam Oakley | Luke Stringer | Mark Stevenson | Judita Preiss
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
Scaling up WSD with Automatically Generated Examples
Weiwei Cheng | Judita Preiss | Mark Stevenson
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2009

pdf
Refining the most frequent sense baseline
Judita Preiss | Jon Dehdari | Josh King | Dennis Mehay
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf
HMMs, GRs, and N-Grams as Lexical Substitution Techniques – Are They Portable to Other Languages?
Judita Preiss | Andrew Coonce | Brittany Baker
Proceedings of the Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning

2007

pdf
A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora
Judita Preiss | Ted Briscoe | Anna Korhonen
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2004

pdf
Can Anaphoric Definite Descriptions be Replaced by Pronouns?
Judita Preiss | Caroline Gasperin | Ted Briscoe
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
WSD for subcategorization acquisition task description
Judita Preiss | Anna Korhonen
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf
Probabilistic WSD in Senseval-3
Judita Preiss
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

pdf
Using Grammatical Relations to Compare Parsers
Judita Preiss
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Improving Subcategorization Acquisition Using Word Sense Disambiguation
Anna Korhonen | Judita Preiss
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Intermediate Parsing for Anaphora Resolution? Implementing the Lappin and Leass non-coreference filters
Judita Preiss | Ted Briscoe
Proceedings of the 2003 EACL Workshop on The Computational Treatment of Anaphora

2002

pdf
Subcategorization Acquisition as an Evaluation Method for WSD
Judita Preiss | Anna Korhonen | Ted Briscoe
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
Improving Subcategorization Acquisition with WSD
Judita Preiss | Anna Korhonen
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

2001

pdf bib
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems
Judita Preiss | David Yarowsky
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

pdf
Disambiguating Noun and Verb Senses Using Automatically Acquired Selectional Preferences
Diana McCarthy | John Carroll | Judita Preiss
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

pdf
Anaphora Resolution with Word Sense Disambiguation
Judita Preiss
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems