Texto4Science: a Quebec French Database of Annotated Short Text Messages
Philippe Langlais, Patrick Drouin, Amélie Paulus, Eugénie Rompré Brodeur, Florent Cottin
Abstract
In October 2009, was launched the Quebec French part of the international sms4science project, called texto4science. Over a period of 10 months, we collected slightly more than 7000 SMSs that we carefully annotated. This database is now ready to be used by the community. The purpose of this article is to relate the efforts put into designing this database and provide some data analysis of the main linguistic phenomenon that we have annotated. We also report on a socio-linguistic survey we conducted within the project.- Anthology ID:
- L12-1214
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1047–1054
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/413_Paper.pdf
- DOI:
- Cite (ACL):
- Philippe Langlais, Patrick Drouin, Amélie Paulus, Eugénie Rompré Brodeur, and Florent Cottin. 2012. Texto4Science: a Quebec French Database of Annotated Short Text Messages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1047–1054, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Texto4Science: a Quebec French Database of Annotated Short Text Messages (Langlais et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/413_Paper.pdf