Abstract
In this paper, I describe my submission to the SemEval-2024 contest. I tackled subtask 1 - “Semantic Textual Relatedness for African and Asian Languages”. To find the semantic relatedness of sentence pairs, I tackled this task by creating models for nine different languages. I then vectorized the text data using a variety of embedding techniques including doc2vec, tf-idf, Sentence-Transformers, Bert, Roberta, and more, and used 11 traditional machine learning techniques of the regression type for analysis and evaluation.- Anthology ID:
- 2024.semeval-1.65
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 420–431
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2024.semeval-1.65/
- DOI:
- 10.18653/v1/2024.semeval-1.65
- Cite (ACL):
- Ron Keinan. 2024. Text Mining at SemEval-2024 Task 1: Evaluating Semantic Textual Relatedness in Low-resource Languages using Various Embedding Methods and Machine Learning Regression Models. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 420–431, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Text Mining at SemEval-2024 Task 1: Evaluating Semantic Textual Relatedness in Low-resource Languages using Various Embedding Methods and Machine Learning Regression Models (Keinan, SemEval 2024)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2024.semeval-1.65.pdf