Beatrice Santorini

Also published as: B. Santorini


2023

pdf
Parsing “Early English Books Online” for Linguistic Search
Seth Kulick | Neville Ryant | Beatrice Santorini
Proceedings of the Society for Computation in Linguistics 2023

2022

pdf
Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing Results and Analysis
Seth Kulick | Neville Ryant | Beatrice Santorini
Findings of the Association for Computational Linguistics: NAACL 2022

The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), a 1.7-million-word treebank that is an important resource for research in syntactic change, has several properties that present potential challenges for NLP technologies. We describe these key features of PPCEME that make it challenging for parsing, including a larger and more varied set of function tags than in the Penn Treebank, and present results for this corpus using a modified version of the Berkeley Neural Parser and the approach to function tag recovery of Gabbard et al. (2006). While this approach to function tag recovery gives reasonable results, it is in some ways inappropriate for span-based parsers. We also present further evidence of the importance of in-domain pretraining for contextualized word representations. The resulting parser will be used to parse Early English Books Online, a 1.5 billion word corpus whose utility for the study of syntactic change will be greatly increased with the addition of accurate parse trees.

pdf
Parsing Early Modern English for Linguistic Search
Seth Kulick | Neville Ryant | Beatrice Santorini
Proceedings of the Society for Computation in Linguistics 2022

2014

pdf
The Penn Parsed Corpus of Modern British English: First Parsing Results and Analysis
Seth Kulick | Anthony Kroch | Beatrice Santorini
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Parser Evaluation Using Derivation Trees: A Complement to evalb
Seth Kulick | Ann Bies | Justin Mott | Anthony Kroch | Beatrice Santorini | Mark Liberman
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf
Using Derivation Trees for Informative Treebank Inter-Annotator Agreement Evaluation
Seth Kulick | Ann Bies | Justin Mott | Mohamed Maamouri | Beatrice Santorini | Anthony Kroch
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

1993

pdf
Building a Large Annotated Corpus of English: The Penn Treebank
Mitchell P. Marcus | Beatrice Santorini | Mary Ann Marcinkiewicz
Computational Linguistics, Volume 19, Number 2, June 1993, Special Issue on Using Large Corpora: II

1991

pdf
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
E. Black | S. Abney | D. Flickenger | C. Gdaniec | R. Grishman | P. Harrison | D. Hindle | R. Ingria | F. Jelinek | J. Klavans | M. Liberman | M. Marcus | S. Roukos | B. Santorini | T. Strzalkowski
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1990

pdf
Deducing Linguistic Structure from the Statistics of Large Corpora
Eric Brill | David Magerman | Mitchell Marcus | Beatrice Santorini
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

pdf
A TAG analysis of the Third construction in German
Anthony Kroch | Beatrice Santorini | Aravind Joshi
Proceedings of the First International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+1)