Nicolas Fiorini


2018

pdf
Personalized neural language models for real-world query auto completion
Nicolas Fiorini | Zhiyong Lu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

Query auto completion (QAC) systems are a standard part of search engines in industry, helping users formulate their query. Such systems update their suggestions after the user types each character, predicting the user’s intent using various signals – one of the most common being popularity. Recently, deep learning approaches have been proposed for the QAC task, to specifically address the main limitation of previous popularity-based methods: the inability to predict unseen queries. In this work we improve previous methods based on neural language modeling, with the goal of building an end-to-end system. We particularly focus on using real-world data by integrating user information for personalized suggestions when possible. We also make use of time information and study how to increase diversity in the suggestions while studying the impact on scalability. Our empirical results demonstrate a marked improvement on two separate datasets over previous best methods in both accuracy and scalability, making a step towards neural query auto-completion in production search engines.

2017

pdf
Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs
Sunil Mohan | Nicolas Fiorini | Sun Kim | Zhiyong Lu
BioNLP 2017

We describe a Deep Learning approach to modeling the relevance of a document’s text to a query, applied to biomedical literature. Instead of mapping each document and query to a common semantic space, we compute a variable-length difference vector between the query and document which is then passed through a deep convolution stage followed by a deep regression network to produce the estimated probability of the document’s relevance to the query. Despite the small amount of training data, this approach produces a more robust predictor than computing similarities between semantic vector representations of the query and document, and also results in significant improvements over traditional IR text factors. In the future, we plan to explore its application in improving PubMed search.