Pablo Ruiz


Enjambment Detection in a Large Diachronic Corpus of Spanish Sonnets
Pablo Ruiz | Clara Martínez Cantón | Thierry Poibeau | Elena González-Blanco
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Enjambment takes place when a syntactic unit is broken up across two lines of poetry, giving rise to different stylistic effects. In Spanish literary studies, there are unclear points about the types of stylistic effects that can arise, and under which linguistic conditions. To systematically gather evidence about this, we developed a system to automatically identify enjambment (and its type) in Spanish. For evaluation, we manually annotated a reference corpus covering different periods. As a scholarly corpus to apply the tool, from public HTML sources we created a diachronic corpus covering four centuries of sonnets (3750 poems), and we analyzed the occurrence of enjambment across stanzaic boundaries in different periods. Besides, we found examples that highlight limitations in current definitions of enjambment.


More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing
Pablo Ruiz | Clément Plancq | Thierry Poibeau
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Text analysis methods widely used in digital humanities often involve word co-occurrence, e.g. concept co-occurrence networks. These methods provide a useful corpus overview, but cannot determine the predicates that relate co-occurring concepts. Our goal was identifying propositions expressing the points supported or opposed by participants in international climate negotiations. Word co-occurrence methods were not sufficient, and an analysis based on open relation extraction had limited coverage for nominal predicates. We present a pipeline which identifies the points that different actors support and oppose, via a domain model with support/opposition predicates, and analysis rules that exploit the output of semantic role labelling, syntactic dependencies and anaphora resolution. Entity linking and keyphrase extraction are also performed on the propositions related to each actor. A user interface allows examining the main concepts in points supported or opposed by each participant, which participants agree or disagree with each other, and about which issues. The system is an example of tools that digital humanities scholars are asking for, to render rich textual information (beyond word co-occurrence) more amenable to quantitative treatment. An evaluation of the tool was satisfactory.


Combining Open Source Annotators for Entity Linking through Weighted Voting
Pablo Ruiz | Thierry Poibeau
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

EL92: Entity Linking Combining Open Source Annotators via Weighted Voting
Pablo Ruiz | Thierry Poibeau
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators
Pablo Ruiz | Thierry Poibeau | Frédérique Mélanie
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations


Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
Pablo Ruiz | Aitor Álvarez | Haritz Arzelus
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Long audio alignment systems for Spanish and English are presented, within an automatic subtitling application. Language-specific phone decoders automatically recognize audio contents at phoneme level. At the same time, language-dependent grapheme-to-phoneme modules perform a transcription of the script for the audio. A dynamic programming algorithm (Hirschberg’s algorithm) finds matches between the phonemes automatically recognized by the phone decoder and the phonemes in the script’s transcription. Alignment accuracy is evaluated when scoring alignment operations with a baseline binary matrix, and when scoring alignment operations with several continuous-score matrices, based on phoneme similarity as assessed through comparing multivalued phonological features. Alignment accuracy results are reported at phoneme, word and subtitle level. Alignment accuracy when using the continuous scoring matrices based on phonological similarity was clearly higher than when using the baseline binary matrix.