Texto4Science: a Quebec French Database of Annotated Short Text Messages

Philippe Langlais, Patrick Drouin, Amélie Paulus, Eugénie Rompré Brodeur, Florent Cottin


Abstract
In October 2009, was launched the Quebec French part of the international sms4science project, called texto4science. Over a period of 10 months, we collected slightly more than 7000 SMSs that we carefully annotated. This database is now ready to be used by the community. The purpose of this article is to relate the efforts put into designing this database and provide some data analysis of the main linguistic phenomenon that we have annotated. We also report on a socio-linguistic survey we conducted within the project.
Anthology ID:
L12-1214
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1047–1054
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/413_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Philippe Langlais, Patrick Drouin, Amélie Paulus, Eugénie Rompré Brodeur, and Florent Cottin. 2012. Texto4Science: a Quebec French Database of Annotated Short Text Messages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1047–1054, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Texto4Science: a Quebec French Database of Annotated Short Text Messages (Langlais et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/413_Paper.pdf