Abstract—The close relation between the stem (relatively the
word meaning) and part of speech tag of the word turns part of
speech tagging as an important preprocessing task in natural
language processing and understanding problem. For example,
if the Turkish word “gelecek” is labeled as noun, the word stem
is to be “gelecek” meaning future. If it is labeled as verb, the
stem is “gel” and in English it means, “come”. In many
languages including Turkish, part of speech tagging problem is
generally solved by rule based approaches. In this paper, a
setup where the neural network architecture SENNA together
with word embeddings is employed. The combination of
Wikipedia 2016 and METU corpora is utilized in training of
word embeddings; PARDER is used in part of speech training
and testing. The word embeddings that are obtained by
different methods and different vector sizes are evaluated
intrinsically considering analogic and semantic similarity
distances; and assessed extrinsically based on the performance
on part of speech tagging task.
Index Terms—Part of speech tagging, word embedding,
SENNA, deep learn.
Şevket Can is with the International Computer Institute, Ege University,
Izmir, Turkey (e-mail: sevketcann@ gmail.com).
Bahar Karaoğlan is with the International Computer Institute, Ege
University, Izmir, Turkey (e-mail: bahar.karaoglan@ege.edu.tr).
Tarık Kışla is with the Department of Computer Education and
Instructional Technologies, Ege University, Izmir, Turkey (e-mail:
tarik.kisla@ege.edu.tr).
Senem Kumova Metin is with the Department of Software Engineering,
İzmir University of Economics, İzmir, Turkey (e-mail:
senem.kumova@ieu.edu.tr).
Cite: Şevket Can, Bahar Karaoğlan, Tarık Kışla, and Senem Kumova Metin, "Using Word Embeddings in Turkish Part of Speech Tagging," International Journal of Machine Learning and Computing vol. 11, no. 5, pp. 367-372, 2021.
Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).