Accelerated Combinatorial Search for Outlier Detection with Provable Bounds on Sub-Optimality


Guihong Wan and Haim Schweitzer. 2021. “Accelerated Combinatorial Search for Outlier Detection with Provable Bounds on Sub-Optimality.” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-21). Publisher's Version


Outliers negatively affect the accuracy of data analysis. In this paper we are concerned with their influence on the accuracy of Principal Component Analysis (PCA). Algorithms that attempt to detect outliers and remove them from the data prior to applying PCA are sometimes called Robust PCA, or Robust Subspace Recovery algorithms. We propose a new algorithm for outlier detection that combines two ideas. The first is "chunk recursive elimination" that was used effectively to accelerate feature selection, and the second is combinatorial search, in a setting similar to A*. Our main result is showing how to combine these two ideas. One variant of our algorithm is guaranteed to compute an optimal solution according to some natural criteria, but its running time makes it impractical for large datasets. Other variants are much faster and come with provable bounds on sub-optimality. Experimental results show the effectiveness of the proposed approach.
Last updated on 12/17/2021