Teodor - George Marchitan

Also published as: Teodor-George Marchitan, Teodor-george Marchitan


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Team Unibuc - NLP at GenAI Detection Task 1: Qwen it detect machine-generated text?
Claudiu Creanga | Teodor-George Marchitan | Liviu P. Dinu
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

We explored both masked language models and causal models. For Subtask A, our best model achieved first-place out of 36 teams when looking at F1 Micro (Auxiliary Score) of 0.8333, and second-place when looking at F1 Macro (Main Score) of 0.8301. For causal models, our best model was a fine-tuned version of Qwen and for masked models, our best model was a fine-tuned version of XLM-Roberta-Base.

pdf bib
Team Unibuc - NLP at SemEval-2025 Task 11: Few-shot text-based emotion detection
Claudiu Creanga | Teodor - George Marchitan | Liviu Dinu
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes the approach of the Unibuc - NLP team in tackling the SemEval 2025 Workshop, Task 11: Bridging the Gap in Text-Based Emotion Detection. We mainly focused on experiments using large language models (Gemini, Qwen, DeepSeek) with either few-shot prompting or fine-tuning. Withour final system, for the multi-label emotion detection track (track A), we got an F1-macro of 0.7546 (26/96 teams) for the English subset, 0.1727 (35/36 teams) for the Portuguese (Mozambican) subset and 0.325 (1/31 teams) for the Emakhuwa subset.

2024

pdf bib
Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction
Liviu P Dinu | Ana Sabina Uban | Alina Maria Cristea | Ioan-Bogdan Iordache | Teodor-George Marchitan | Simona Georgescu | Laurentiu Zoicas
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We introduce a new database of cognate words and etymons for the five main Romance languages, the most comprehensive one to date. We propose a strong benchmark for the automatic reconstruction of protowords for Romance languages, by applying a set of machine learning models and features on these data. The best results reach 90% accuracy in predicting the protoword of a given cognate set, surpassing existing state-of-the-art results for this task and showing that computational methods can be very useful in assisting linguists with protoword reconstruction.

pdf bib
Team Unibuc - NLP at SemEval-2024 Task 8: Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection
Teodor-george Marchitan | Claudiu Creanga | Liviu P. Dinu
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes the approach of the UniBuc - NLP team in tackling the SemEval 2024 Task 8: Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection. We explored transformer-based and hybrid deep learning architectures. For subtask B, our transformer-based model achieved a strong second-place out of 77 teams with an accuracy of 86.95%, demonstrating the architecture’s suitability for this task. However, our models showed overfitting in subtask A which could potentially be fixed with less fine-tunning and increasing maximum sequence length. For subtask C (token-level classification), our hybrid model overfit during training, hindering its ability to detect transitions between human and machine-generated text.