Abstract—Pedestrian density estimation is very useful and
important under transportation environment. In this paper, we
present a novel weighting scheme of “bag of visual words
model” for pedestrian density estimation, which characterizes
both the weight and the relative spatial arrangement aspects of
all visual words in depicting an image. We firstly analyze the
visual words generation process. By counting the number of
images through which each visual word is clustered and
computing the cluster radius of each visual word, we can give
each visual word a weight. Specially, the co-occurrences of
visual words are computed with respect to spatial predicates
over a hierarchical spatial partitioning of an image. The
representation captures both the absolute and relative spatial
arrangement of the words and, through the choice and
combination of the predicates, can characterize a variety of
spatial relationships. We validate this hypothesis using a
challenging ground truth pedestrian dataset. Our approach is
shown to result in higher classification accuracy rates than a
non-weighting bag-of-visual-words approach. The time used to
generate the visual words of our approach is only 1/20 to 1/30
compared to the time of the traditional image feature cluster
process.
Index Terms—Pedestrian detection, spatial relationship,
visual words, SIFT.
The authors are with North China University of Technology, Beijing,
100144, China (e-mail: zhangshilin@126.com).
Cite: Shilin Zhang and Xunyuan Zhang, "Pedestrian Density Estimation by a Weighted Bag of Visual Words Model," International Journal of Machine Learning and Computing vol. 5, no. 3, pp. 214-218, 2015.