Xavier Holt
2018
Extracting structured data from invoices
Xavier Holt
|
Andrew Chisholm
Proceedings of the Australasian Language Technology Association Workshop 2018
Business documents encode a wealth of information in a format tailored to human consumption – i.e. aesthetically disbursed natural language text, graphics and tables. We address the task of extracting key fields (e.g. the amount due on an invoice) from a wide-variety of potentially unseen document formats. In contrast to traditional template driven extraction systems, we introduce a content-driven machine-learning approach which is both robust to noise and generalises to unseen document formats. In a comparison of our approach with alternative invoice extraction systems, we observe an absolute accuracy gain of 20\% across compared fields, and a 25\%–94\% reduction in extraction latency.
2016
Presenting a New Dataset for the Timeline Generation Problem
Xavier Holt
|
Will Radford
|
Ben Hachey
Proceedings of the Australasian Language Technology Association Workshop 2016