Abstract—Recently, large amount of data is widely available
in information systems and data mining has attracted a big
attention to researchers to turn such data into useful knowledge.
This implies the existence of low quality, unreliable, redundant
and noisy data which negatively affect the process of observing
knowledge and useful pattern. Therefore, researchers need
relevant data from huge records using feature selection
methods. Feature selection is the process of identifying the most
relevant attributes and removing the redundant and irrelevant
attributes. In this study, a comparison between filter based
feature selection methods based on a well-known dataset (i.e.,
hepatitis dataset) was carried out and four classification
algorithms were used to evaluate the performance of the
algorithms. Among the algorithms, Naïve Bayes and Decision
Table classifiers have higher accuracy rates on the hepatitis
dataset than the others after the application of feature selection
methods. The study revealed that feature selection methods are
capable to improve the performance of learning algorithms.
However, no single filter based feature selection method is the
best. Overall, Consistency Subset, Info Gain Attribute Eval,
One-R Attribute Eval and Relief Attribute Eval methods
performed better results than the others.
Index Terms—Feature selection, hepatitis, J48, naïve bayes,
IBK, decision table.
Pinar Yildirim is with the Okan University, Istanbul, Turkey (e-mail:
pinar.yildirim@ okan.edu.tr).
Cite: Pinar Yildirim, "Filter Based Feature Selection Methods for Prediction of Risks in Hepatitis Disease," International Journal of Machine Learning and Computing vol. 5, no. 4, pp. 258-263, 2015.