Marcos Lima


2020

pdf
Inferring about fraudulent collusion risk on Brazilian public works contracts in official texts using a Bi-LSTM approach
Marcos Lima | Roberta Silva | Felipe Lopes de Souza Mendes | Leonardo R. de Carvalho | Aleteia Araujo | Flavio de Barros Vidal
Findings of the Association for Computational Linguistics: EMNLP 2020

Public works procurements move US$ 10 billion yearly in Brazil and are a preferred field for collusion and fraud. Federal Police and audit agencies investigate collusion (bid-rigging), over-pricing, and delivery fraud in this field and efforts have been employed to early detect fraud and collusion on public works procurements. The current automatic methods of fraud detection use structured data to classification and usually do not involve annotated data. The use of NLP for this kind of application is rare. Our work introduces a new dataset formed by public procurement calls available on Brazilian official journal (Diário Oficial da União), using by 15,132,968 textual entries of which 1,907 are annotated risky entries. Both bottleneck deep neural network and BiLSTM shown competitive compared with classical classifiers and achieved better precision (93.0% and 92.4%, respectively), which signs improvements in a criminal fraud investigation.