—In this work we propose an unsupervised machine learning method of predicting chronic fatigue syndrome (CFS) based on the k-means algorithm using self-reported questionnaire responses. We first suggest a method of determining the presence of a symptom based on its frequency and severity using an unsupervised dynamic thresholding approach. This threshold is used to diagnose subjects with 54 symptoms related to CFS. Based on these diagnoses, k-means is used to predict the presence of CFS. We find that k-means does not have significantly worse predictive diagnostic accuracy than commonly used CFS case definitions. After applying supervised feature selection, k-means achieves significantly better diagnostic accuracy than any of the case definitions examined. We use these results to suggest the basis for an empirically founded CFS case definition.
—Chronic fatigue syndrome, computer-aided diagnosis, k-means clustering, machine learning.
S. P. Watson is with the Carleton College, Northfield, MN 55057 USA (e-mail: firstname.lastname@example.org).
A. S. Ruskin is with Pomona College, Claremont, CA 91711 USA (e-mail: email@example.com).
J. D. Furst and V. Simonis are with the College of Computing and Digital Media, DePaul University, Chicago, IL 60604 USA.
L.A. Jason and M. Sunnquist are with the College of Science and Health, DePaul University, Chicago, IL 60614 USA.
Cite: Samuel P. Watson, Amy S. Ruskin, Valerie Simonis, Leonard A. Jason, Madison Sunnquist, and Jacob D. Furst, "Identifying Defining Aspects of Chronic Fatigue Syndrome via Unsupervised Machine Learning and Feature Selection," International Journal of Machine Learning and Computing vol.4, no. 2, pp. 133-138, 2014.