Fast Distance Metrics in Low-dimensional Space for Neighbor Search Problems


Guihong Wan, Crystal Maung, Chenxu Zhang, and Haim Schweitzer. 2020. “Fast Distance Metrics in Low-dimensional Space for Neighbor Search Problems.” IEEE International Conference on Data Mining (ICDM-20). Publisher's Version


We consider popular dimension reduction techniques that project data on a low dimensional subspace. They include Principal Component Analysis, Column Subset Selection, and Johnson-Lindenstrauss projections. These techniques have been classically used to efficiently compute various approximations. We propose the following three-step procedure for enhancing the accuracy of such approximations: 1. Unknown quantities in the approximation are replaced with random variables. 2. The Maximum Entropy method is applied to infer the most likely probability distribution. 3. Expected values of the random variables are used to compute the enhanced estimates. Our use of the Maximum Entropy method requires knowledge of vector norms that can be easily computed during the dimension reduction. We demonstrate significant enhancements in average accuracy for Euclidean distance and Mahalanobis distance, and improvements in evaluating k-nearest neighbors and k-furthest neighbors by using the enhanced Euclidean distance formula.
Last updated on 12/17/2021