@inproceedings{grouin-2016-text,
    title = "Text Segmentation of Digitized Clinical Texts",
    author = "Grouin, Cyril",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Grobelnik, Marko  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, Helene  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)",
    month = may,
    year = "2016",
    address = "Portoro{\v{z}}, Slovenia",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/landing_page/L16-1570/",
    pages = "3592--3599",
    abstract = "In this paper, we present the experiments we made to recover the original page layout structure into two columns from layout damaged digitized files. We designed several CRF-based approaches, either to identify column separator or to classify each token from each line into left or right columns. We achieved our best results with a model trained on homogeneous corpora (only files composed of 2 columns) when classifying each token into left or right columns (overall F-measure of 0.968). Our experiments show it is possible to recover the original layout in columns of digitized documents with results of quality."
}Markdown (Informal)
[Text Segmentation of Digitized Clinical Texts](https://preview.aclanthology.org/landing_page/L16-1570/) (Grouin, LREC 2016)
ACL
- Cyril Grouin. 2016. Text Segmentation of Digitized Clinical Texts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3592–3599, Portorož, Slovenia. European Language Resources Association (ELRA).