Ph.D. Applied Physics, Stanford University, 2016
M.S. Applied Physics, Stanford University, 2013
B.S. Engineering Physics (Summa Cum Laude),
Cornell University, 2010
My primary research focus is in analyzing how and when large, highly parameterized models can be fit effectively with limited data, as well as the development of learning algorithms that are effective in this regime.
During my PhD research with Surya Ganguli as a member of the Neural Dynamics and Computation Lab at Stanford University I focused on the developing field of high dimensional statistics, where inference accuracy is hindered by a high dimensional parameter space (the curse of dimensionality). By treating high dimensional inference using mathematics from disordered system and random matrix theory I derived a mathematical description for how classical methods in regression such as maximum likelihood (ML) and maximum-a-posteriori (MAP) can be optimally modified through a 'smoothing' procedure which yields the minimal mean-squared-error in high dimensional regression. Classical statistical estimates of error, which take the limit of infinite data and a finite number of parameters, can be inaccurate or misleading even when the number of training samples is as much as an order of magnitude larger than the number of parameters being learned, so that analysis of high dimensional inference algorithms has application to a variety of 'big data.'
In my postdoctoral work advised by Haim Sompolinsky as part of the Center for Brain Science at Harvard University, I am expanding on my PhD research by investigating how and when more general inference algorithms such as neural networks and support vector machines are able to learn a large numbers of model parameters from limited data and generate accurate predictions. In particular, deep convolutional neural networks and ResNets routinely learn a set of parameters as much as two orders of magnitude larger than the number of samples they are trained on. In this over-parameterized setting, classical approaches to estimate or bound generalization error such as Rademacher complexity and VC dimension are no longer effective, and alternative approaches that analyze the average case performance are required. To develop an understanding of these complicated inference algorithms I worked on a combination of theoretical and numerical investigation focused on analyzing neural networks using a student-teacher framework, where one network generates data that another network trains on. A few of the topics I have studied in this context:
- Analysis of deep and shallow linear networks and how their dynamics lead to a surprising non-monotonic generalization error curve as they are made larger, meaning that bigger neural networks often work better in practice even when they do not need to be larger to fit the data.
- My work analyzing deep linear networks led to predictions of a speed-accuracy tradeoff which I have worked to numerically investigated in more complicated neural networks.
- Analytic approach to learning in two layer non-linear networks (trained with stochastic gradient descent) to better understand how the generalization error is impacted by both the learning rate and the mis-match of student and teacher architectures.
- Investigating how measurements of mutual information in neural networks may be impacted by the neural network architecture and the distribution from which data is generated.
Research work and interests include: High dimensional statistics, deep neural networks, machine learning, compressed sensing, complex systems, cavity method, replica method, loopy belief propagation, approximate message passing, random matrix theory, non-equilibrium statistical physics, adversarial examples and networks, robotics, reinforcement learning.
Outside of research I enjoy recreational rock climbing, running, and playing guitar.