Welcome to my home page. I am a PhD student in the Bioinformatics and Integrative Genomics (BIG) Program at Harvard Medical School in The Division of Medical Sciences under the aegis of The Graduate School of Arts and Sciences. I am a member of the Kostic Lab at the Joslin Diabetes Center where I work on elucidating host-gut microbiome interactions that lead to autoimmunity by integrating machine learning techniques with rapid experimental validation methods that leverage emerging sequencing technologies. My research interests are metagenomics, metatranscriptomics, host-microbe interactions, sequencing technologies, functional genomics, machine learning, ontology evaluation, genome structure, and personalized genomic medicine. My Erdős number is 4. My research is currently externally funded both by a NIH T32 grant and by Amazon. 

During the fall of 2016 I worked in the Gehlenborg Lab in the Department of Biomedical Informatics at HMS on HiGlass. During the summer of 2016 I rotated in the Huttenhower Lab jointly in the Department of Biostatistics at the T.H. Chan School of Public Health and at the Broad Institute of MIT and Harvard where I worked on techniques and pipelines to help with putative protein function prediction algorithms that worked across many metagenomic/metatranscriptomic contexts.  

In 2016 I graduated cum laude with special departmental honors in computer science from Trinity University. My undergraduate thesis research was conducted under Matt Hibbs on osteoblast development and bone maintenance in Mus musculus where I focused on methods to consider tissue context specificity properly when using machine learning to make gene-gene functional relationship predictions. Additionally, from 2015 to 2016 I worked in Carol Bult’s group on the Patient Derived Xenograft (PDX) project at The Jackson Laboratory where I built a data-mining pipeline that aims to better subtype Triple Negative Breast Cancer tumors and computationally predict chemotherapy drug response in them.

Featured examples of my past work are available on this site; full details about my previous scholarship can be found in my CV. 

Recent Posters & Talks

Data Mining Diverse Compendia of Triple Negative Breast Cancer Samples for Improved Tumor Subtyping, at Bioinformatics of Disease and Treatment Session @ The 24th Annual International Conference on Intelligent Systems for Molecular Biology (2016), Monday, July 11, 2016:

Program Website

This work was a collaboration between the Bult Group at The Jackson Laboratory and the Hibbs Group at Trinity University. 

Read more about Data Mining Diverse Compendia of Triple Negative Breast Cancer Samples for Improved Tumor Subtyping

Recent Publications

Gut Microbiota: Small Molecules Modulate Host Cellular Functions
Jacob M. Luber and Aleksandar D. Kostic. 4/24/2017. “Gut Microbiota: Small Molecules Modulate Host Cellular Functions.” Current Biology , 27, 8, Pp. R307-R310. Publisher's VersionAbstract

The human gut metagenome was recently discovered to encode vast collections of biosynthetic gene clusters with diverse chemical potential, almost none of which are yet functionally validated. Recent work elucidates common microbiome-derived biosynthetic gene clusters encoding peptide aldehydes that inhibit human proteases.

Peter Kerpedjiev, Nezar Abdennur, Fritz Lekschas, Chuck McCallum, Kasper Dinkla, Hendrik Strobelt, Jacob M Luber, Scott Ouellette, Alaleh Azhir, Nikhil Kumar, Jeewon Hwang, Burak H. Alver, Hanspeter Pfister, Leonid A Mirny, Peter J. Park, and Nils Gehlenborg. Submitted. “HiGlass: Web-based Visual Comparison And Exploration Of Genome Interaction Maps.” bioRxiv, 121889. Publisher's VersionAbstract


We present HiGlass (http://higlass.io), a web-based viewer for genome interaction maps featuring synchronized navigation of multiple views as well as continuous zooming and panning for navigation across genomic loci and resolutions. We demonstrate how visual comparison of Hi-C and other genomic data from different experimental conditions can be used to efficiently identify salient outcomes of experimental perturbations, generate new hypotheses, and share the results with the community.


Jacob M. Luber, Braden T. Tierney, Evan M. Cofer, Chirag J. Patel, and Aleksandar D. Kostic. Submitted. “Aether: Leveraging Linear Programming For Optimal Cloud Computing In Genomics.” bioRxiv. Publisher's VersionAbstract
Across biology we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective, and scalable framework that uses linear programming (LP) to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis while maximizing its efficiency and speed. As a test, we used Aether to de novo assemble 1572 metagenomic samples, a task it completed in merely 13 hours with cost savings of approximately 80% relative to comparable methods.
Jacob M Luber. 2016. “Improved Prediction of Mouse Pathways Related to Bone Maintenance Through Machine Learning Utilizing Diverse Genomic Data.” Trinity University Computer Science Honors Undergraduate Thesis.Abstract

The genetic cause of osteoporosis is poorly understood, but a wealth of functional genomic data exist from which osteoporosis related pathways could be identified. A machine learning pipeline was created using Support Vector Machines and was first applied using as inputs all available gene expression data and a second time using only bone-related data. In both cases, models were trained using a manually curated training set of gene relationships known to support bone maintenance and development. Each model was used to predict novel pairwise gene relationships, and specific pathways were compared between models to identify relationships supported primarily by data collected in bone-related contexts as opposed to other cellular contexts. Our results indicate a more accurate result was achieved through biologically-motivated feature selection that considers mammalian cellular context. Our results reinforce the observation that if two genes are functionally associated in one context they may not be functionally associated in all contexts, necessitating careful consideration of training sets and input data into functional prediction methods.