I am a fifth year graduate student at Harvard University focusing on databases. I work in the Data Systems Laboratory and am advised by Stratos Idreos. I am most interested in how to best tailor data structures and algorithms given properties of the data itself. To better understand what this means, note that most classic data structures and algorithms do not assume anything about the distribution of the data being stored or which or algorithms run over. By making some assumptions, often based on past uses of the data structure or algorithm, we can then design better data structures. Examples can be seen in 1) Column Sketches, which transform data to being uniformly distributed for faster comparisons in database scans, 2) Stacked Filters, which use query skew to create orders of magnitude better false positive rates in filters, and 3) The Data Calculator, which uses query workloads to design custom data data structures for the workload at hand.
A second line or search during my PhD has been about weighted sampling, specifically over data streams. This work shows that using time-biased weighted sampling, wherein newer items are more likely to appear in a sample than older items, is a good way to keep machine learning models automatically up to date with shifting data distributions.
Attached is my CV.