We present a novel edge sparsification approach for semi-supervised learning on undirected and attributed graphs. The main challenge is to retain few edges while minimizing the loss of node classification accuracy. The task can be mathematically formulated as a bi-level optimization problem. We propose to use meta-gradients, which have traditionally been used in meta-learning, to solve the optimization problem, specifically, treating the graph adjacency matrix as hyperparameters to optimize. Experimental results show the effectiveness of the proposed approach. Remarkably, with the resulting sparse and light graph, in many cases the classification accuracy is significantly improved.
A common task in the analysis of data is to compute an approximate embedding of the data in a low-dimensional subspace. The standard algorithm for computing this subspace is the well-known Principal Component Analysis (PCA). PCA can be extended to the case where some data points are viewed as "outliers" that can be ignored, allowing the remaining data points (“inliers”) to be more tightly embedded. We develop a new algorithm that detects outliers so that they can be removed prior to applying PCA. The main idea is to rank each point by looking ahead and evaluating the change in the global PCA error if that point is considered as an outlier. Our technical contribution is showing that this lookahead procedure can be implemented efficiently, producing an accurate algorithm with running time not much above the running time of standard PCA algorithms.
Outliers negatively affect the accuracy of data analysis. In this paper we are concerned with their influence on the accuracy of Principal Component Analysis (PCA). Algorithms that attempt to detect outliers and remove them from the data prior to applying PCA are sometimes called Robust PCA, or Robust Subspace Recovery algorithms. We propose a new algorithm for outlier detection that combines two ideas. The first is "chunk recursive elimination" that was used effectively to accelerate feature selection, and the second is combinatorial search, in a setting similar to A*. Our main result is showing how to combine these two ideas. One variant of our algorithm is guaranteed to compute an optimal solution according to some natural criteria, but its running time makes it impractical for large datasets. Other variants are much faster and come with provable bounds on sub-optimality. Experimental results show the effectiveness of the proposed approach.
We study the approximation of a target matrix in terms of several selected columns of another matrix, sometimes called "a dictionary". This approximation problem arises in various domains, such as signal processing, computer vision, and machine learning. An optimal column selection algorithm for the special case where the target matrix has only one column is known since the 1970’s, but most previously proposed column selection algorithms for the general case are greedy. We propose the first nontrivial optimal algorithm for the general case, using a heuristic search setting similar to the classical A* algorithm. We also propose practical suboptimal algorithms in a setting similar to the classical Weighted A* algorithm. Experimental results show that our suboptimal algorithms compare favorably with the current state-of-the-art greedy algorithms. They also provide bounds on how close their solutions are to the optimal solution.
We consider popular dimension reduction techniques that project data on a low dimensional subspace. They include Principal Component Analysis, Column Subset Selection, and Johnson-Lindenstrauss projections. These techniques have been classically used to efficiently compute various approximations. We propose the following three-step procedure for enhancing the accuracy of such approximations: 1. Unknown quantities in the approximation are replaced with random variables. 2. The Maximum Entropy method is applied to infer the most likely probability distribution. 3. Expected values of the random variables are used to compute the enhanced estimates. Our use of the Maximum Entropy method requires knowledge of vector norms that can be easily computed during the dimension reduction. We demonstrate significant enhancements in average accuracy for Euclidean distance and Mahalanobis distance, and improvements in evaluating k-nearest neighbors and k-furthest neighbors by using the enhanced Euclidean distance formula.
Classical Principal Component Analysis (PCA) approximates data in terms of projections on a small number of orthogonal vectors. There are simple procedures to efficiently compute various functions of the data from the PCA approximation. The most important function is arguably the Euclidean distance between data items. This can be used, for example, to solve the approximate nearest neighbor problem. We use random variables to model the inherent uncertainty in such approximations, and apply the Maximum Entropy Method to infer the underlying probability distribution. We propose using the expected values of distances between these random variables as improved estimates of the distance. We show experimentally that in most cases results obtained by our method are more accurate than what is obtained by the classical approach. This improves the accuracy of a classical technique that have been used with little change for over 100 years.
The following are two classical approaches to dimensionality reduction: 1. Approximating the data with a small number of features that exist in the data (feature selection). 2. Approximating the data with a small number of arbitrary features (feature extraction). We study a generalization that approximates the data with both selected and extracted features. We show that an optimal solution to this hybrid problem involves a combinatorial search, and cannot be trivially obtained even if one can solve optimally the separate problems of selection and extraction. Our approach that gives optimal and approximate solutions uses a “best first” heuristic search. The algorithm comes with both an a priori and an a posteriori optimality guarantee similar to those that can be obtained for the classical weighted A* algorithm. Experimental results show the effectiveness of the proposed approach.