These are two examples of current projects.
Multi-Study machine Learning
In many areas of science, multiple datasets are now available for training prediction algorithms. Replication of prediction performance across studies is proving challenging. With Prasad Patil and other colleagues we are interseted in making progress in this area by investigating the combination of two fundamental and underutilized opportunities: 1) to train on multiple studies; 2) to use ensembles of prediction models trained on different studies. We explore whether the combination of these two elements can provide novel insight into the replicability of predictions. We are also designing methods that incorporate replicability in weighting ensembles.
This figure summarizises the structure of a "cross-study learner". This is a type of learning algorithm that is trained to understand that there is variation across studies. Its weights can be optimized to make it perform bettern than single-study approaches in future, yet unobserved, studies. This PNAS article provides details.
Decision Support for Cancer Susceptibility Gene Carriers
Following a rapid drop in the cost of DNA sequencing, multi-gene panel testing for inherited genetic susceptibility has become widely used. Yet aside for a very small number of well known genes, clinicians lack simple and reliable tools to personalize prevention decisions for individuals who test positive. With Danielle Braun, Kevin Hughes and other members of the BayesMendel lab we are developing a clinical decision support system called Ask2me (see also this article in the Journal of Genetic Counseling) that allows a clinician to enter important factors about a patient with a mutation, and immediately receive the absolute risk of various cancers for that mutation for that patient with that current clinical situation. Our goal is to allow for personalized prevention decisions for individuals who test positive for uncommon genetic mutations.
This figure (made by Cathy Zhang) summarizes the results of a meta-analysis of papers on risk of colorectal cancer for carriers of MLH1 mutations. One of the challenges in developing a reliably and current decision support system is the integration of diverse information on risk from the literature. Other challenges we are addressing are automatic identification of relevant literature (in collaboration with Regina Barzilay at MIT), and risk counseling for carriers of mutation of uncertain clinical significance (or VUSs).