Abstract—Sampling techniques for data mining applications
can be broadly categorized into Random Sampling (RS), Active
Learning (AL) and Progressive Sampling (PS). Progressive
Sampling techniques grow an initial sample up to the point
beyond which model accuracy no longer significantly improves.
These methods have been shown to be computationally efficient.
The sampling schedule to be used with progressive sampling
techniques is still an ongoing issue of research due to the fact
that available sampling schemes may either overshoot, resulting
in a final sample which is larger than necessary, or they may
grow the sample too slowly thus requiring many iterations of
the algorithm before convergence is reached. We demonstrate
how using Batch Mode Uncertainty Sampling from the domain
of active learning, to progressively grow the sample, can
significantly improve the performance of progressive sampling.
Through a series of trials on both simulated and real data, we
show that our proposed Progressive Batch Mode Uncertainty
Sampling (PBMUS) algorithm converges with a comparable or
smaller number of data points at higher accuracy and in some
cases, less computational time.
Index Terms—Active learning, uncertainty sampling,
progressive sampling, linear regression with local sampling,
random sampling, sampling, machine learning.
The authors are with George Mason University, Fairfax, VA 22030, USA
(e-mail: aelrafey@gmu.edu, jwojtusi@gmu.edu).
Cite: Amr ElRafey and Janusz Wojtusiak, "A Hybrid Active Learning and Progressive Sampling Algorithm," International Journal of Machine Learning and Computing vol. 8, no. 5, pp. 423-427, 2018.