Abstract—It is a common task in pattern recognition to evaluate the similarity of large data objects. These are often represented by high dimensional vectors. A frequently used mathematical model for evaluating their similarity is to view them as points (vectors) in a high dimensional space, and compute their distances from each other. The “distance,” however, can be defined in a very complicated way, it may be much more complex than the well known Euclidean distance. Therefore, the algorithmic bottleneck often becomes the number of distance computations that need to be carried out. We consider the case when we have to compute all the distances between n objects, where n is large. Without any shortcuts it takes n (n − 1)/2 = O( n2) distance computations. In those applications where the distances are complicated, being defined by sophisticated algorithms (such as in speech and image recognition), a quadratically growing number of distance computations becomes a severe bottleneck. We prove the following general result that can help eliminating the bottleneck: for a large and general class of distances it is possible to obtain a very close approximation of each of the O(n2) pairwise distances of n objects by doing only a linear number distance computations, which is optimal with respect to the order of magnitude. Moreover, the approximation factor can be made arbitrarily close to 1, making the approximation error negligible. The needed side computations to achieve this reduction can also be done in polynomial time.
Index Terms—Pattern recognition, approximate distance computation, metric space, normed space.
Andras Farago is with the Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75080, USA (e-mail: firstname.lastname@example.org).
Cite:Andras Farago, "Optimizing Massive Distance Computations in Pattern Recognition," International Journal of Machine Learning and Computing vol.4, no. 1, pp. 114-119, 2014.