Sergi Blanco-Cuaresma
2024
Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach
Sergi Blanco-Cuaresma
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)
Sergi Blanco-Cuaresma
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)
This study explores the use of Large Language Models (LLMs) to analyze text comments from Reddit users, aiming to achieve two primary objectives: firstly, to pinpoint critical excerpts that support a predefined psychological assessment of suicidal risk; and secondly, to summarize the material to substantiate the preassigned suicidal risk level. The work is circumscribed to the use of “open-source” LLMs that can be run locally, thereby enhancing data privacy. Furthermore, it prioritizes models with low computational requirements, making it accessible to both individuals and institutions operating on limited computing budgets. The implemented strategy only relies on a carefully crafted prompt and a grammar to guide the LLM’s text completion. Despite its simplicity, the evaluation metrics show outstanding results, making it a valuable privacy-focused and cost-effective approach. This work is part of the Computational Linguistics and Clinical Psychology (CLPsych) 2024 shared task.
INDUS: Effective and Efficient Language Models for Scientific Applications
Bishwaranjan Bhattacharjee | Aashka Trivedi | Masayasu Muraoka | Muthukumaran Ramasubramanian | Takuma Udagawa | Iksha Gurung | Nishan Pantha | Rong Zhang | Bharath Dandala | Rahul Ramachandran | Manil Maskey | Kaylin Bugbee | Michael M. Little | Elizabeth Fancher | Irina Gerasimov | Armin Mehrabian | Lauren Sanders | Sylvain V. Costes | Sergi Blanco-Cuaresma | Kelly Lockhart | Thomas Allen | Felix Grezes | Megan Ansdell | Alberto Accomazzi | Yousef El-Kurdi | Davis Wertheimer | Birgit Pfitzmann | Cesar Berrospi Ramis | Michele Dolfi | Rafael Teixeira De Lima | Panagiotis Vagenas | S. Karthik Mukkavilli | Peter W. J. Staar | Sanaz Vahidinia | Ryan McGranaghan | Tsengdar J. Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Bishwaranjan Bhattacharjee | Aashka Trivedi | Masayasu Muraoka | Muthukumaran Ramasubramanian | Takuma Udagawa | Iksha Gurung | Nishan Pantha | Rong Zhang | Bharath Dandala | Rahul Ramachandran | Manil Maskey | Kaylin Bugbee | Michael M. Little | Elizabeth Fancher | Irina Gerasimov | Armin Mehrabian | Lauren Sanders | Sylvain V. Costes | Sergi Blanco-Cuaresma | Kelly Lockhart | Thomas Allen | Felix Grezes | Megan Ansdell | Alberto Accomazzi | Yousef El-Kurdi | Davis Wertheimer | Birgit Pfitzmann | Cesar Berrospi Ramis | Michele Dolfi | Rafael Teixeira De Lima | Panagiotis Vagenas | S. Karthik Mukkavilli | Peter W. J. Staar | Sanaz Vahidinia | Ryan McGranaghan | Tsengdar J. Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this insight, we developed INDUS, a comprehensive suite of LLMs tailored for the closely-related domains of Earth science, biology, physics, heliophysics, planetary sciences and astrophysics, and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address NLP tasks, (2) a contrastive-learning based text embedding model trained using a diverse set of datasets to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation for applications which have latency or resource constraints. We also created three new scientific benchmark datasets, Climate-Change NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. We show that our models outperform both general-purpose (RoBERTa) and domain- specific (SciBERT) encoders on these new tasks as well as existing tasks in the domains of interest. Furthermore, we demonstrate the use of these models in two industrial settings- as a retrieval model for large-scale vector search applications and in automatic content tagging systems.
2023
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
Tirthankar Ghosal | Felix Grezes | Thomas Allen | Kelly Lockhart | Alberto Accomazzi | Sergi Blanco-Cuaresma
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
Tirthankar Ghosal | Felix Grezes | Thomas Allen | Kelly Lockhart | Alberto Accomazzi | Sergi Blanco-Cuaresma
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
Function of Citation in Astrophysics Literature (FOCAL): Findings of the Shared Task
Felix Grezes | Thomas Allen | Tirthankar Ghosal | Sergi Blanco-Cuaresma
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
Felix Grezes | Thomas Allen | Tirthankar Ghosal | Sergi Blanco-Cuaresma
Proceedings of the Second Workshop on Information Extraction from Scientific Publications
2022
Proceedings of the First Workshop on Information Extraction from Scientific Publications
Tirthankar Ghosal | Sergi Blanco-Cuaresma | Alberto Accomazzi | Robert M. Patton | Felix Grezes | Thomas Allen
Proceedings of the First Workshop on Information Extraction from Scientific Publications
Tirthankar Ghosal | Sergi Blanco-Cuaresma | Alberto Accomazzi | Robert M. Patton | Felix Grezes | Thomas Allen
Proceedings of the First Workshop on Information Extraction from Scientific Publications
Overview of the First Shared Task on Detecting Entities in the Astrophysics Literature (DEAL)
Felix Grezes | Sergi Blanco-Cuaresma | Thomas Allen | Tirthankar Ghosal
Proceedings of the First Workshop on Information Extraction from Scientific Publications
Felix Grezes | Sergi Blanco-Cuaresma | Thomas Allen | Tirthankar Ghosal
Proceedings of the First Workshop on Information Extraction from Scientific Publications
In this article, we describe the overview of our shared task: Detecting Entities in the Astrophysics Literature (DEAL). The DEAL shared task was part of the Workshop on Information Extraction from Scientific Publications (WIESP) in AACL-IJCNLP 2022. Information extraction from scientific publications is critical in several downstream tasks such as identification of critical entities, article summarization, citation classification, etc. The motivation of this shared task was to develop a community-wide effort for entity extraction from astrophysics literature. Automated entity extraction would help to build knowledge bases, high-quality meta-data for indexing and search, and several other use-cases of interests. Thirty-three teams registered for DEAL, twelve of them participated in the system runs, and finally four teams submitted their system descriptions. We analyze their system and performance and finally discuss the findings of DEAL.
Search
Fix author
Co-authors
- Thomas Allen 5
- Felix Grezes 5
- Tirthankar Ghosal 4
- Alberto Accomazzi 3
- Kelly Lockhart 2
- Megan Ansdell 1
- Cesar Berrospi Ramis 1
- Bishwaranjan Bhattacharjee 1
- Kaylin Bugbee 1
- Sylvain V. Costes 1
- Bharath Dandala 1
- Rafael Teixeira De Lima 1
- Michele Dolfi 1
- Yousef El-Kurdi 1
- Elizabeth Fancher 1
- Irina Gerasimov 1
- Iksha Gurung 1
- Tsengdar J. Lee 1
- Michael M. Little 1
- Manil Maskey 1
- Ryan McGranaghan 1
- Armin Mehrabian 1
- S. Karthik Mukkavilli 1
- Masayasu Muraoka 1
- Nishan Pantha 1
- Robert M. Patton 1
- Birgit Pfitzmann 1
- Rahul Ramachandran 1
- Muthukumaran Ramasubramanian 1
- Lauren Sanders 1
- Peter W. J. Staar 1
- Aashka Trivedi 1
- Takuma Udagawa 1
- Panagiotis Vagenas 1
- Sanaz Vahidinia 1
- Davis Wertheimer 1
- Rong Zhang 1