@inproceedings{ghaddar-langlais-2020-sedar,
    title = "{SEDAR}: a Large Scale {F}rench-{E}nglish Financial Domain Parallel Corpus",
    author = "Ghaddar, Abbas  and
      Langlais, Phillippe",
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.442/",
    pages = "3595--3602",
    language = "eng",
    ISBN = "979-10-95546-34-4",
    abstract = "This paper describes the acquisition, preprocessing and characteristics of SEDAR, a large scale English-French parallel corpus for the financial domain. Our extensive experiments on machine translation show that SEDAR is essential to obtain good performance on finance. We observe a large gain in the performance of machine translation systems trained on SEDAR when tested on finance, which makes SEDAR suitable to study domain adaptation for neural machine translation. The first release of the corpus comprises 8.6 million high quality sentence pairs that are publicly available for research at \url{https://github.com/autorite/sedar-bitext}."
}Markdown (Informal)
[SEDAR: a Large Scale French-English Financial Domain Parallel Corpus](https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.442/) (Ghaddar & Langlais, LREC 2020)
ACL