2019
pdf
abs
An Unsupervised Query Rewriting Approach Using N-gram Co-occurrence Statistics to Find Similar Phrases in Large Text Corpora
Hans Moen
|
Laura-Maria Peltonen
|
Henry Suhonen
|
Hanna-Maria Matinolli
|
Riitta Mieronkoski
|
Kirsi Telen
|
Kirsi Terho
|
Tapio Salakoski
|
Sanna Salanterä
Proceedings of the 22nd Nordic Conference on Computational Linguistics
We present our work towards developing a system that should find, in a large text corpus, contiguous phrases expressing similar meaning as a query phrase of arbitrary length. Depending on the use case, this task can be seen as a form of (phrase-level) query rewriting. The suggested approach works in a generative manner, is unsupervised and uses a combination of a semantic word n-gram model, a statistical language model and a document search engine. A central component is a distributional semantic model containing word n-grams vectors (or embeddings) which models semantic similarities between n-grams of different order. As data we use a large corpus of PubMed abstracts. The presented experiment is based on manual evaluation of extracted phrases for arbitrary queries provided by a group of evaluators. The results indicate that the proposed approach is promising and that the use of distributional semantic models trained with uni-, bi- and trigrams seems to work better than a more traditional unigram model.
2018
pdf
abs
Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network
Hans Moen
|
Kai Hakala
|
Laura-Maria Peltonen
|
Henry Suhonen
|
Petri Loukasmäki
|
Tapio Salakoski
|
Filip Ginter
|
Sanna Salanterä
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
We present our initial evaluation of a prototype system designed to assist nurses in assigning subject headings to nursing narratives – written in the context of documenting patient care in hospitals. Currently nurses may need to memorize several hundred subject headings from standardized nursing terminologies when structuring and assigning the right section/subject headings to their text. Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards. We hypothesize that this could reduce the time and effort needed for nursing documentation in hospitals. A central component of the system is a text classification model based on a long short-term memory (LSTM) recurrent neural network architecture, trained on a large data set of nursing notes. A simple Web-based interface has been implemented for user interaction. To evaluate the system, three nurses write a set of artificial nursing shift notes in a fully unstructured narrative manner, without planning for or consider the use of sections and subject headings. These are then fed to the system which assigns subject headings to each sentence and then groups them into paragraphs. Manual evaluation is conducted by a group of nurses. The results show that about 70% of the sentences are assigned to correct subject headings. The nurses believe that such a system can be of great help in making nursing documentation in hospitals easier and less time consuming. Finally, various measures and approaches for improving the system are discussed.
2017
pdf
abs
Detecting mentions of pain and acute confusion in Finnish clinical text
Hans Moen
|
Kai Hakala
|
Farrokh Mehryary
|
Laura-Maria Peltonen
|
Tapio Salakoski
|
Filip Ginter
|
Sanna Salanterä
BioNLP 2017
We study and compare two different approaches to the task of automatic assignment of predefined classes to clinical free-text narratives. In the first approach this is treated as a traditional mention-level named-entity recognition task, while the second approach treats it as a sentence-level multi-label classification task. Performance comparison across these two approaches is conducted in the form of sentence-level evaluation and state-of-the-art methods for both approaches are evaluated. The experiments are done on two data sets consisting of Finnish clinical text, manually annotated with respect to the topics pain and acute confusion. Our results suggest that the mention-level named-entity recognition approach outperforms sentence-level classification overall, but the latter approach still manages to achieve the best prediction scores on several annotation classes.
2014
pdf
Care Episode Retrieval
Hans Moen
|
Erwin Marsi
|
Filip Ginter
|
Laura-Maria Murtola
|
Tapio Salakoski
|
Sanna Salanterä
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)
2013
pdf
NTNU-CORE: Combining strong features for semantic similarity
Erwin Marsi
|
Hans Moen
|
Lars Bungum
|
Gleb Sizov
|
Björn Gambäck
|
André Lynum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
pdf
Towards Dynamic Word Sense Discrimination with Random Indexing
Hans Moen
|
Erwin Marsi
|
Björn Gambäck
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality