• Mar 27, 2019 News!Good News! All papers from Volume 9, Number 1 have been indexed by Scopus!   [Click]
  • May 07, 2019 News!Vol.9, No.3 has been published with online version.   [Click]
  • Mar 30, 2019 News!Vol.9, No.2 has been published with online version.   [Click]
General Information
    • ISSN: 2010-3700
    • Abbreviated Title: Int. J. Mach. Learn. Comput.
    • DOI: 10.18178/IJMLC
    • Editor-in-Chief: Dr. Lin Huang
    • Executive Editor:  Ms. Cherry L. Chen
    • Abstracing/Indexing: Scopus(since 2017), EI (INSPEC, IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
    • E-mail: ijmlc@ejournal.net
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.
IJMLC 2014 Vol.4(4): 365-370 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2014.V4.438

A Feature Selection Method for Twitter News Classification

Inoshika Dilrukshi and Kasun de Zoysa
Abstract—This paper presence a new feature selection method which can be used for creating data set in order to classify Twitter short messages. The Twitter short messages contain only 140 characters. Thus, the number of words per sentence is almost equal for all sentences. Once you pool the all text messages together, there can be number of words in the pool but, for a given sentence, there will be only few words included from the pool. This causes to have a sparse matrix as the feature vector. By removing the unrelated words from the feature space, the dimension can be reduced and therefore, the sparseness can be reduced. The unrelated words can be define as the common words (high frequent words) and noise words (low frequent words). Even though by removing these unrelated words, still it may contain some unrelated words. Thus, a feature selection technique was needed to apply in order to select the best feature set. The suggested new feature selection method was based on the Information Theory. It was named as Ratio Method. The calculated value increase when the word occurs frequently in a particular group and it decrease when the word occur in all groups. The best features can be choose by using a proper threshold. Some popular text classifiers such as SVM, Naïve Bayes and Decision Trees are used to evaluate the performance of the new feature selection method and to compare the new method with existing methods.

Index Terms—Ratio method, information theory, term frequency, inverse document frequency.

The authors are with the University of Colombo School of Computing, Colombo 7, Sri Lanka (e-mail: inoshi@scorelab.org, kasun@ucsc.lk).


Cite: Inoshika Dilrukshi and Kasun de Zoysa, "A Feature Selection Method for Twitter News Classification," International Journal of Machine Learning and Computing vol.4, no. 4, pp. 365-370, 2014.

Copyright © 2008-2019. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net