Guihong Wan and Schweitzer Haim. 2021. “
Edge Sparsification for Graphs via Meta-Learning.” IEEE 37th International Conference on Data Engineering (ICDE-21).
Publisher's VersionAbstractWe present a novel edge sparsification approach for semi-supervised learning on undirected and attributed graphs. The main challenge is to retain few edges while minimizing the loss of node classification accuracy. The task can be mathematically formulated as a bi-level optimization problem. We propose to use meta-gradients, which have traditionally been used in meta-learning, to solve the optimization problem, specifically, treating the graph adjacency matrix as hyperparameters to optimize. Experimental results show the effectiveness of the proposed approach. Remarkably, with the resulting sparse and light graph, in many cases the classification accuracy is significantly improved.
Guihong Wan and Schweitzer Haim. 2021. “
A Lookahead Algorithm for Robust Subspace Recovery.” IEEE International Conference on Data Mining (ICDM-21).
Abstract
A common task in the analysis of data is to compute an approximate embedding of the data in a low-dimensional subspace. The standard algorithm for computing this subspace is the well-known Principal Component Analysis (PCA). PCA can be extended to the case where some data points are viewed as "outliers" that can be ignored, allowing the remaining data points (“inliers”) to be more tightly embedded. We develop a new algorithm that detects outliers so that they can be removed prior to applying PCA. The main idea is to rank each point by looking ahead and evaluating the change in the global PCA error if that point is considered as an outlier. Our technical contribution is showing that this lookahead procedure can be implemented efficiently, producing an accurate algorithm with running time not much above the running time of standard PCA algorithms.
Guihong Wan and Haim Schweitzer. 2021. “
Accelerated Combinatorial Search for Outlier Detection with Provable Bounds on Sub-Optimality.” Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-21).
Publisher's VersionAbstractOutliers negatively affect the accuracy of data analysis. In this paper we are concerned with their influence on the accuracy of Principal Component Analysis (PCA). Algorithms that attempt to detect outliers and remove them from the data prior to applying PCA are sometimes called Robust PCA, or Robust Subspace Recovery algorithms. We propose a new algorithm for outlier detection that combines two ideas. The first is "chunk recursive elimination" that was used effectively to accelerate feature selection, and the second is combinatorial search, in a setting similar to A*. Our main result is showing how to combine these two ideas. One variant of our algorithm is guaranteed to compute an optimal solution according to some natural criteria, but its running time makes it impractical for large datasets. Other variants are much faster and come with provable bounds on sub-optimality. Experimental results show the effectiveness of the proposed approach.
Guihong Wan and Haim Schweitzer. 2021. “
Heuristic Search for Approximating One Matrix in Terms of Another Matrix.” Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21).
Publisher's VersionAbstract
We study the approximation of a target matrix in terms of several selected columns of another matrix, sometimes called "a dictionary". This approximation problem arises in various domains, such as signal processing, computer vision, and machine learning. An optimal column selection algorithm for the special case where the target matrix has only one column is known since the 1970’s, but most previously proposed column selection algorithms for the general case are greedy. We propose the first nontrivial optimal algorithm for the general case, using a heuristic search setting similar to the classical A* algorithm. We also propose practical suboptimal algorithms in a setting similar to the classical Weighted A* algorithm. Experimental results show that our suboptimal algorithms compare favorably with the current state-of-the-art greedy algorithms. They also provide bounds on how close their solutions are to the optimal solution.