Projects

These are two examples of current projects.

Multi-Study machine Learning

In many areas of science, multiple datasets are now available for training prediction algorithms.  Replication of prediction performance across studies is proving challenging. With Prasad Patil and other colleagues we are interseted in making progress in this area by investigating the combination of two fundamental and underutilized opportunities: 1) to train on multiple studies; 2) to use ensembles of prediction models trained on different studies. We explore whether the combination of these two elements can provide novel insight into the replicability of predictions. We are also designing methods that incorporate replicability in weighting ensembles.      

    Architecture of Cross-Study Machine Learning Algorithms

This figure summarizises the structure of a "cross-study learner". This is a type of learning algorithm that is trained to understand that there is variation across studies. Its weights can be optimized to make it perform bettern than single-study approaches in future, yet unobserved, studies. This PNAS article provides details.

Decision Support for Genetic Testing and Early Detection

Following a rapid drop in the cost of DNA sequencing, multi-gene panel testing for inherited genetic susceptibility has become widely used, and multi-cancer early detection (MCED) assays are quickly emerging. With co-leader Danielle Braun and other members of the BayesMendel lab we are developing machine learning applications for risk stratification and clinical decision support systems to increase the efficiency of panel testing and MCED testing. The cornerstone is a comprehensive pre- and post-testing risk stratification tool called PanelPRO. Ongoing project range from innovative statistical and computational approaches, to clinical implementation trials. 

eLife panelPro