مرکز منطقه ای اطلاع رسانی علوم و فناوری Journal of Information Systems and Telecommunication (JIST) 2322-1437 13 49 2025 5 25 A Turkish Dataset and BERTurk-Contrastive Model for Semantic Textual Similarity A Turkish Dataset and BERTurk-Contrastive Model for Semantic Textual Similarity 24 32 10.61186/jist.48127.13.49.24 en Somaiyeh Dehghan Mehmet Fatih Amasyali Yildiz Technical University 2024 9 27 <p>Semantic Textual Similarity (STS) is an important NLP task that measures the degree of semantic equivalence between two texts, even if the sentence pairs contain different words. While extensively studied in English, STS has received limited attention in Turkish. This study introduces BERTurk-contrastive, a novel BERT-based model leveraging contrastive learning to enhance the STS task in Turkish. Our model aims to learn representations by bringing similar sentences closer together in the embedding space while pushing dissimilar ones farther apart. To support this task, we release SICK-tr, a new STS dataset in Turkish, created by translating the English SICK dataset. We evaluate our model on STSb-tr and SICK-tr, achieving a significant improvement of 5.92 points over previous models. These results establish BERTurk-contrastive as a robust solution for STS in Turkish and provide a new benchmark for future research.</p>

http://jist.ir/en/Article/Download/48127