Guillaume Becquin


2023

pdf
Semantic Similarity Covariance Matrix Shrinkage
Guillaume Becquin | Saher Esmeir
Findings of the Association for Computational Linguistics: EMNLP 2023

An accurate estimation of the covariance matrix is a critical component of many applications in finance, including portfolio optimization. The sample covariance suffers from the curse of dimensionality when the number of observations is in the same order or lower than the number of variables. This tends to be the case in portfolio optimization, where a portfolio manager can choose between thousands of stocks using historical daily returns to guide their investment decisions. To address this issue, past works proposed linear covariance shrinkage to regularize the estimated matrix. While effective, the proposed methods relied solely on historical price data and thus ignored company fundamental data. In this work, we propose to utilise semantic similarity derived from textual descriptions or knowledge graphs to improve the covariance estimation. Rather than using the semantic similarity directly as a biased estimator to the covariance, we employ it as a shrinkage target. The resulting covariance estimators leverage both semantic similarity and recent price history, and can be readily adapted to a broad range of financial securities. The effectiveness of the approach is demonstrated for a period including diverse market conditions and compared with the covariance shrinkage prior art.

2020

pdf
GBe at FinCausal 2020, Task 2: Span-based Causality Extraction for Financial Documents
Guillaume Becquin
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

This document describes a system for causality extraction from financial documents submitted as part of the FinCausal 2020 Workshop. The main contribution of this paper is a description of the robust post-processing used to detect the number of cause and effect clauses in a document and extract them. The proposed system achieved a weighted-average F1 score of more than 95% for the official blind test set during the post-evaluation phase and exact clauses match for 83% of the documents.

pdf
End-to-end NLP Pipelines in Rust
Guillaume Becquin
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

The recent progress in natural language processing research has been supported by the development of a rich open source ecosystem in Python. Libraries allowing NLP practitioners but also non-specialists to leverage state-of-the-art models have been instrumental in the democratization of this technology. The maturity of the open-source NLP ecosystem however varies between languages. This work proposes a new open-source library aimed at bringing state-of-the-art NLP to Rust. Rust is a systems programming language for which the foundations required to build machine learning applications are available but still lacks ready-to-use, end-to-end NLP libraries. The proposed library, rust-bert, implements modern language models and ready-to-use pipelines (for example translation or summarization). This allows further development by the Rust community from both NLP experts and non-specialists. It is hoped that this library will accelerate the development of the NLP ecosystem in Rust. The library is under active development and available at https://github.com/guillaume-be/rust-bert.