A common task in the analysis of data is to compute an approximate embedding of the data in a low-dimensional subspace. The standard algorithm for computing this subspace is the well-known Principal Component Analysis (PCA). PCA can be extended to the case where some data points are viewed as "outliers" that can be ignored, allowing the remaining data points (“inliers”) to be more tightly embedded. We develop a new algorithm that detects outliers so that they can be removed prior to applying PCA. The main idea is to rank each point by looking ahead and evaluating the change in the global PCA error if that point is considered as an outlier. Our technical contribution is showing that this lookahead procedure can be implemented efficiently, producing an accurate algorithm with running time not much above the running time of standard PCA algorithms.

%B IEEE International Conference on Data Mining (ICDM-21) %G eng