Home > Archive > 2021 > Volume 11 Number 5 (Sept. 2021) >
IJMLC 2021 Vol.11(5): 367-372 ISSN: 2010-3700
DOI: 10.18178/ijmlc.2021.11.5.1063

Using Word Embeddings in Turkish Part of Speech Tagging

Şevket Can, Bahar Karaoğlan, Tarık Kışla, and Senem Kumova Metin

Abstract—The close relation between the stem (relatively the word meaning) and part of speech tag of the word turns part of speech tagging as an important preprocessing task in natural language processing and understanding problem. For example, if the Turkish word “gelecek” is labeled as noun, the word stem is to be “gelecek” meaning future. If it is labeled as verb, the stem is “gel” and in English it means, “come”. In many languages including Turkish, part of speech tagging problem is generally solved by rule based approaches. In this paper, a setup where the neural network architecture SENNA together with word embeddings is employed. The combination of Wikipedia 2016 and METU corpora is utilized in training of word embeddings; PARDER is used in part of speech training and testing. The word embeddings that are obtained by different methods and different vector sizes are evaluated intrinsically considering analogic and semantic similarity distances; and assessed extrinsically based on the performance on part of speech tagging task.

Index Terms—Part of speech tagging, word embedding, SENNA, deep learn.

Şevket Can is with the International Computer Institute, Ege University, Izmir, Turkey (e-mail: sevketcann@ gmail.com).
Bahar Karaoğlan is with the International Computer Institute, Ege University, Izmir, Turkey (e-mail: bahar.karaoglan@ege.edu.tr).
Tarık Kışla is with the Department of Computer Education and Instructional Technologies, Ege University, Izmir, Turkey (e-mail: tarik.kisla@ege.edu.tr).
Senem Kumova Metin is with the Department of Software Engineering, İzmir University of Economics, İzmir, Turkey (e-mail: senem.kumova@ieu.edu.tr).

[PDF]

Cite: Şevket Can, Bahar Karaoğlan, Tarık Kışla, and Senem Kumova Metin, "Using Word Embeddings in Turkish Part of Speech Tagging," International Journal of Machine Learning and Computing vol. 11, no. 5, pp. 367-372, 2021.

Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

General Information

  • ISSN: 2010-3700 (Online)
  • Abbreviated Title: Int. J. Mach. Learn. Comput.
  • Frequency: Bimonthly
  • DOI: 10.18178/IJMLC
  • Editor-in-Chief: Dr. Lin Huang
  • Executive Editor:  Ms. Cherry L. Chen
  • Abstracing/Indexing: Inspec (IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
  • E-mail: ijmlc@ejournal.net


Article Metrics in Dimensions