Welcome to my home page. I am a PhD student in the Bioinformatics and Integrative Genomics (BIG) Program at Harvard Medical School in The Division of Medical Sciences under the aegis of The Graduate School of Arts and Sciences. I am a member of the Kostic Lab at the Joslin Diabetes Center where I work on elucidating host-gut microbiome interactions that lead to autoimmunity by integrating machine learning techniques with rapid experimental validation methods that leverage emerging sequencing technologies. My research interests are metagenomics, metatranscriptomics, host-microbe interactions, sequencing technologies, functional genomics, machine learning, ontology evaluation, genome structure, and personalized genomic medicine. My Erdős number is 4. My research is currently externally funded both by a NIH T32 grant and by Amazon. 

During the fall of 2016 I worked in the Gehlenborg Lab in the Department of Biomedical Informatics at HMS on HiGlass. During the summer of 2016 I rotated in the Huttenhower Lab jointly in the Department of Biostatistics at the T.H. Chan School of Public Health and at the Broad Institute of MIT and Harvard where I worked on techniques and pipelines to help with putative protein function prediction algorithms that worked across many metagenomic/metatranscriptomic contexts.  

In 2016 I graduated cum laude with special departmental honors in computer science from Trinity University. My undergraduate thesis research was conducted under Matt Hibbs on osteoblast development and bone maintenance in Mus musculus where I focused on methods to consider tissue context specificity properly when using machine learning to make gene-gene functional relationship predictions. Additionally, from 2015 to 2016 I worked in Carol Bult’s group on the Patient Derived Xenograft (PDX) project at The Jackson Laboratory where I built a data-mining pipeline that aims to better subtype Triple Negative Breast Cancer tumors and computationally predict chemotherapy drug response in them.

Featured examples of my past work are available on this site; full details about my previous scholarship can be found in my CV. 

Recent Publications

Gut Microbiota: Small Molecules Modulate Host Cellular Functions
Jacob M. Luber and Aleksandar D. Kostic. 4/24/2017. “Gut Microbiota: Small Molecules Modulate Host Cellular Functions.” Current Biology , 27, 8, Pp. R307-R310. Publisher's VersionAbstract

The human gut metagenome was recently discovered to encode vast collections of biosynthetic gene clusters with diverse chemical potential, almost none of which are yet functionally validated. Recent work elucidates common microbiome-derived biosynthetic gene clusters encoding peptide aldehydes that inhibit human proteases.

Peter Kerpedjiev, Nezar Abdennur, Fritz Lekschas, Chuck McCallum, Kasper Dinkla, Hendrik Strobelt, Jacob M Luber, Scott Ouellette, Alaleh Azhir, Nikhil Kumar, Jeewon Hwang, Burak H. Alver, Hanspeter Pfister, Leonid A Mirny, Peter J. Park, and Nils Gehlenborg. Submitted. “HiGlass: Web-based Visual Comparison And Exploration Of Genome Interaction Maps.” bioRxiv, 121889. Publisher's VersionAbstract


We present HiGlass (http://higlass.io), a web-based viewer for genome interaction maps featuring synchronized navigation of multiple views as well as continuous zooming and panning for navigation across genomic loci and resolutions. We demonstrate how visual comparison of Hi-C and other genomic data from different experimental conditions can be used to efficiently identify salient outcomes of experimental perturbations, generate new hypotheses, and share the results with the community.


Jacob M. Luber, Braden T. Tierney, Evan M. Cofer, Chirag J. Patel, and Aleksandar D. Kostic. 12/8/2017. “Aether: Leveraging Linear Programming For Optimal Cloud Computing In Genomics.” Bioinformatics, btx787. Publisher's VersionAbstract


Across biology we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities.


Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective, and scalable framework that uses linear programming (LP) to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines.


Data utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation, and a tutorial are available at (http://aether.kosticlab.org).


chirag_patel@hms.harvard.edu and aleksandar.kostic@joslin.harvard.edu

Job Dekker, Andrew S. Belmont, Mitchell Guttman, Victor O. Leshyk, John T. Lis, Stavros Lomvardas, Leonid A. Mirny, Clodagh C. O’Shea, Peter J. Park, Bing Ren, Joan C. Ritland Politz, Jay Shendure, Sheng Zhong, and The Nucleome 4D Network. 9/2017. “The 4D nucleome project.” Nature, 549, 7671, Pp. 219-226. Publisher's VersionAbstract
The 4D Nucleome Network aims to develop and apply approaches to map the structure and dynamics of the human and mouse genomes in space and time with the goal of gaining deeper mechanistic insights into how the nucleus is organized and functions. The project will develop and benchmark experimental and computational approaches for measuring genome conformation and nuclear organization, and investigate how these contribute to gene regulation and other genome functions. Validated experimental technologies will be combined with biophysical approaches to generate quantitative models of spatial genome organization in different biological states, both in cell populations and in single cells.
Chao Fang, Huanzi Zhong, Yuxiang Lin, Bin Chen, Mo Han, Huahui Ren, Haorong Lu, Jacob Mayne Luber, Min Xia, Wangsheng Li, Shayna Stein, Xun Xu, Wenwei Zhang, Radoje Drmanac, Jian Wang, Huanming Yang, Lennart Hammarström, Aleksandar David Kostic, Karsten Kristiansen, and Junhua Li. 2017. “Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.” Gigascience.Abstract
Background: More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of two Illumina platforms. Findings: Using fecal samples from 20 healthy individuals we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency, was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform versus the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the two Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high quality reads (96.06% of raw reads) per sample with 90.56% of bases scoring Q30 and above was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02% -3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias towards genes with higher GC content being enriched on the HiSeq platforms. Conclusion: Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Jacob M Luber. 2016. “Improved Prediction of Mouse Pathways Related to Bone Maintenance Through Machine Learning Utilizing Diverse Genomic Data.” Trinity University Computer Science Honors Undergraduate Thesis.Abstract

The genetic cause of osteoporosis is poorly understood, but a wealth of functional genomic data exist from which osteoporosis related pathways could be identified. A machine learning pipeline was created using Support Vector Machines and was first applied using as inputs all available gene expression data and a second time using only bone-related data. In both cases, models were trained using a manually curated training set of gene relationships known to support bone maintenance and development. Each model was used to predict novel pairwise gene relationships, and specific pathways were compared between models to identify relationships supported primarily by data collected in bone-related contexts as opposed to other cellular contexts. Our results indicate a more accurate result was achieved through biologically-motivated feature selection that considers mammalian cellular context. Our results reinforce the observation that if two genes are functionally associated in one context they may not be functionally associated in all contexts, necessitating careful consideration of training sets and input data into functional prediction methods. 


Recent Posters & Talks

Exploring L1 and L2 Regularization Techniques For Combining Learning and Feature Selection on Cancer Imaging Data, at Massachusetts Institute of Technology , Monday, December 11, 2017

 We developed a statistical learning approach to predict head and neck cancer response to radiation therapy using PET medical imaging data. PET imaging looks at glucose metabolism, which allows cancer cells to proliferate, with the help of a tracer. Glucose metabolism is known to be highly indicative of treatment response and patient survival. We have utilized publically available clinical trial data from The Cancer Imaging Archive (http://doi.org/10.7937/K9/TCIA.2017.umz8dv6s) as our training data. Inherently, imaging data of this type has hundreds of potential features; however,...

Read more about Exploring L1 and L2 Regularization Techniques For Combining Learning and Feature Selection on Cancer Imaging Data
Data Mining Diverse Compendia of Triple Negative Breast Cancer Samples for Improved Tumor Subtyping, at Bioinformatics of Disease and Treatment Session @ The 24th Annual International Conference on Intelligent Systems for Molecular Biology (2016), Monday, July 11, 2016:

Program Website

This work was a collaboration between the Bult Group at The Jackson Laboratory and the Hibbs Group at Trinity University. 

Read more about Data Mining Diverse Compendia of Triple Negative Breast Cancer Samples for Improved Tumor Subtyping