Abstract—This paper presents a semi-supervised learning
technique to classify video clips. Usually, many tasks are done
by categorizing video clips using deep learning techniques.
However, based on the number of online videos today, it is
necessary to use high computing power to accomplish this task.
The authors propose methods that use Self-Organizing Map
(SOM) to create a feature space representing clusters of video
frames. The authors then classified them using simple voting,
calculating entropy, neural networks, and Long-Short Term
Memory (LSTM). The researchers also show finding frame
numbers that are used to cluster video frames according to
accuracy and training time. The results of this approach are
presented based on testing 18 specific classes of real-world
datasets from TV-programs containing 912 videos. The authors
evaluated the techniques using five-fold cross-validation that
our method archived 71.98% of average accuracy. Their
computing time was then assessed, which achieved
approximately 40 minutes of average computing time.
Moreover, the researchers also compared the present proposal
to other baseline models, including C3D and CNN-LSTM, and
also used scene and action-recognition datasets, namely
Hollywood2 to evaluate the technique. The authors archived
93.72% of average accuracy.
Index Terms—Computer vision, unsupervised learning, self-organizing map, LSTM.
The authors are with Chulalongkorn University, Bangkok, 10500, Thaliand (e-mail: firstname.lastname@example.org, email@example.com).
Cite: Itthisak Phueaksri and Sukree Sinthupinyo, "Using Clustered Frames to Classify Videos," International Journal of Machine Learning and Computing vol. 10, no. 4, pp. 562-567, 2020.Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).