• Jul 03, 2017 News!Good News! Since 2017, IJMLC has been indexed by Scopus!
  • Jul 06, 2017 News!Vol.7, No.2 has been published with online version.   [Click]
  • Jul 01, 2017 News!Vol.7, No.1 has been published with online version.   [Click]
Search
General Information
Editor-in-chief
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.
IJMLC 2014 Vol.4(2): 177-182 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2014.V4.408

A Novel String Distance Function Based on Most Frequent K Characters

Sadi Evren Seker, Oguz Altun, Uğur Ayan, and Cihan Mert
Abstract—This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings.
   Most of the simple calculation methods, such as string length are fast to calculate but doesn’t represent the string correctly. On the other hand the methods like keeping the histogram over all characters in the string are slower but good to represent the string characteristics in some areas, like natural language.
   We propose a new metric, easy to calculate and satisfactory for string comparison. Method is built on a hash function, which gets a string at any size and outputs the most frequent K characters with their frequencies.
   The outputs are open for comparison and our studies showed that the success rate is quite satisfactory for the text mining operations.

Index Terms—String distance function, string similarity metric.

Sadi Evren Seker is with the Department of Business, Istanbul Medeniyet University, Istanbul, Turkey (e-mail: academic@sadievrenseker.com ).
Oguz Altan is with the Department of Computer Science, Epoka University, 1039 Tirana Albania. (e-mail: oaltun@epoka.edu.al).
Uğur Ayan is with the Turkish National Science Foundation, Istanbul Turkey (e-mail: ugur.ayan@tubitak.gov.tr)
Cihan Mert is with the Department of Informatics, International Black Sea University, 0131 Tbilisi Georgia(e-mail: cmert@ibsu.edu.ge).

[PDF]

Cite: Sadi Evren Seker, Oguz Altun, Uğur Ayan, and Cihan Mert, "A Novel String Distance Function Based on Most Frequent K Characters," International Journal of Machine Learning and Computing vol.4, no. 2, pp. 177-182, 2014.

Copyright © 2008-2015. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net