Abstract—Malicious software, also known as malware, is a
huge problem that costs consumers billions of dollars each
year. To solve this problem, a significant amount of research
has been dedicated towards detecting malware. In this paper,
we introduce a genetic and evolutionary feature selection
technique for the identification of HTML code associated with
malware. We believe that there may be an association between
malware and the HTML code that it is embedded in. Our
results show that this technique outperforms previous
techniques in terms of recognition accuracy as well as the total
number of features needed for recognition.
Index Terms—Authorship classification, biometrics, feature
extraction, genetic and evolutionary computation (GEC),
malware.
H. C. Williams, J. N. Carter, W. L. Campbell, K. Roy, and G. V. Dozier
are with the Computer Science Department, North Carolina Agricultural
and Technical State University, Greensboro, NC 27411 USA (e-mail:
hcwillia@aggies.ncat.edu, jncarte1@aggies.ncat.edu,
wlcampbe@aggies.ncat.edu, kroy@ncat.edu, gvdozier@ncat.edu).
Cite: Henry C. Williams, Joi N. Carter, Willie L. Campbell, Kaushik Roy, and Gerry V. Dozier, "Genetic & Evolutionary Feature Selection for Author Identification of HTML Associated with Malware," International Journal of Machine Learning and Computing vol.4, no. 3, pp. 250-255, 2014.