Procheta Sen


2019

pdf
Word-Node2Vec: Improving Word Embedding with Document-Level Non-Local Word Co-occurrences
Procheta Sen | Debasis Ganguly | Gareth Jones
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

A standard word embedding algorithm, such as word2vec and glove, makes a strong assumption that words are likely to be semantically related only if they co-occur locally within a window of fixed size. However, this strong assumption may not capture the semantic association between words that co-occur frequently but non-locally within documents. In this paper, we propose a graph-based word embedding method, named ‘word-node2vec’. By relaxing the strong constraint of locality, our method is able to capture both the local and non-local co-occurrences. Word-node2vec constructs a graph where every node represents a word and an edge between two nodes represents a combination of both local (e.g. word2vec) and document-level co-occurrences. Our experiments show that word-node2vec outperforms word2vec and glove on a range of different tasks, such as predicting word-pair similarity, word analogy and concept categorization.

2018

pdf
Tempo-Lexical Context Driven Word Embedding for Cross-Session Search Task Extraction
Procheta Sen | Debasis Ganguly | Gareth Jones
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Task extraction is the process of identifying search intents over a set of queries potentially spanning multiple search sessions. Most existing research on task extraction has focused on identifying tasks within a single session, where the notion of a session is defined by a fixed length time window. By contrast, in this work we seek to identify tasks that span across multiple sessions. To identify tasks, we conduct a global analysis of a query log in its entirety without restricting analysis to individual temporal windows. To capture inherent task semantics, we represent queries as vectors in an abstract space. We learn the embedding of query words in this space by leveraging the temporal and lexical contexts of queries. Embedded query vectors are then clustered into tasks. Experiments demonstrate that task extraction effectiveness is improved significantly with our proposed method of query vector embedding in comparison to existing approaches that make use of documents retrieved from a collection to estimate semantic similarities between queries.