Burcu Can


2021

pdf bib
Bilingual Terminology Extraction Using Neural Word Embeddings on Comparable Corpora
Darya Filippova | Burcu Can | Gloria Corpas Pastor
Proceedings of the Student Research Workshop Associated with RANLP 2021

Term and glossary management are vital steps of preparation of every language specialist, and they play a very important role at the stage of education of translation professionals. The growing trend of efficient time management and constant time constraints we may observe in every job sector increases the necessity of the automatic glossary compilation. Many well-performing bilingual AET systems are based on processing parallel data, however, such parallel corpora are not always available for a specific domain or a language pair. Domain-specific, bilingual access to information and its retrieval based on comparable corpora is a very promising area of research that requires a detailed analysis of both available data sources and the possible extraction techniques. This work focuses on domain-specific automatic terminology extraction from comparable corpora for the English – Russian language pair by utilizing neural word embeddings.

2020

pdf bib
Self Attended Stack-Pointer Networks for Learning Long Term Dependencies
Salih Tuc | Burcu Can
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

We propose a novel deep neural architecture for dependency parsing, which is built upon a Transformer Encoder (Vaswani et al. 2017) and a Stack Pointer Network (Ma et al. 2018). We first encode each sentence using a Transformer Network and then the dependency graph is generated by a Stack Pointer Network by selecting the head of each word in the sentence through a head selection process. We evaluate our model on Turkish and English treebanks. The results show that our trasformer-based model learns long term dependencies efficiently compared to sequential models such as recurrent neural networks. Our self attended stack pointer network improves UAS score around 6% upon the LSTM based stack pointer (Ma et al. 2018) for Turkish sentences with a length of more than 20 words.

2019

pdf bib
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Isabelle Augenstein | Spandana Gella | Sebastian Ruder | Katharina Kann | Burcu Can | Johannes Welbl | Alexis Conneau | Xiang Ren | Marek Rei
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

2018

pdf bib
Characters or Morphemes: How to Represent Words?
Ahmet Üstün | Murathan Kurfalı | Burcu Can
Proceedings of The Third Workshop on Representation Learning for NLP

In this paper, we investigate the effects of using subword information in representation learning. We argue that using syntactic subword units effects the quality of the word representations positively. We introduce a morpheme-based model and compare it against to word-based, character-based, and character n-gram level models. Our model takes a list of candidate segmentations of a word and learns the representation of the word based on different segmentations that are weighted by an attention mechanism. We performed experiments on Turkish as a morphologically rich language and English with a comparably poorer morphology. The results show that morpheme-based models are better at learning word representations of morphologically complex languages compared to character-based and character n-gram level models since the morphemes help to incorporate more syntactic knowledge in learning, that makes morpheme-based models better at syntactic tasks.

pdf bib
Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation
Burcu Can | Suresh Manandhar
Computational Linguistics, Volume 44, Issue 2 - June 2018

This article presents a probabilistic hierarchical clustering model for morphological segmentation. In contrast to existing approaches to morphology learning, our method allows learning hierarchical organization of word morphology as a collection of tree structured paradigms. The model is fully unsupervised and based on the hierarchical Dirichlet process. Tree hierarchies are learned along with the corresponding morphological paradigms simultaneously. Our model is evaluated on Morpho Challenge and shows competitive performance when compared to state-of-the-art unsupervised morphological segmentation systems. Although we apply this model for morphological segmentation, the model itself can also be used for hierarchical clustering of other types of data.

2013

pdf bib
An Agglomerative Hierarchical Clustering Algorithm for Labelling Morphs
Burcu Can | Suresh Manandhar
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Dirichlet Processes for Joint Learning of Morphology and PoS Tags
Burcu Can | Suresh Manandhar
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Probabilistic Hierarchical Clustering of Morphological Paradigms
Burcu Can | Suresh Manandhar
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics