My webpage has moved! I am currently an assistant professor in the department of computer science and engineering at the University of Texas at Arlington. I am also a faculty member in the multi-interprofessional center for health informatics. My group focuses on 1) developing deep learning accelerated search tools for petabytes of cancer imaging data, 2) building algorithmic piplines to determine how the microbiome augments cancer immunotherapy, and 3) building analysis tools for high dimensional `omics data that provide utility at the bench and in the clinic. 

I am currently recruiting postdoctoral research fellows, software engineers, staff scientists, PhD students, masters students, and undergraduate students. If you are an interested in doing a PhD with me, please apply through the UTA CSE department, specify interest in the bioinformatics track, and mention my name in your application. If you are interested in joining my lab as a postdoc, software engineer, or staff scientist please email me directly. I currently only recruit undergrad and masters students who are already in the CSE department at UTA.   

I defended my PhD in August of 2020 and and was a postdoctoral researcher at the NCI Cancer Data Science Laboratory at The National Institutes of Health working with Peng Jiang and Eytan Ruppin from 2020 through 2021. 

I was formerly a NSF Graduate Research Fellow and PhD Candidate in the Bioinformatics and Integrative Genomics (BIG) Program at Harvard Medical School in The Division of Medical Sciences under the aegis of The Graduate School of Arts and Sciences. My research interests are biomedical informatics, spatial 'omics, computer vision, functional genomics, machine learning, and personalized genomic medicine. My Erdős number is 4. My research was previously funded by a NSF GRFP Fellowship (September 2018-August 2021) and was previously funded by a NIH T32 grant (August 2016-August 2018) and Amazon (March 2017-March 2018). 

In 2016 I graduated cum laude with special departmental honors in computer science from Trinity University. My undergraduate thesis research was conducted under Matt Hibbs on osteoblast development and bone maintenance in Mus musculus where I focused on methods to consider tissue context specificity properly when using machine learning to make gene-gene functional relationship predictions. Additionally, from 2015 to 2016 I worked in Carol Bult’s group on the Patient Derived Xenograft (PDX) project at The Jackson Laboratory where I built a data-mining pipeline that aims to better subtype Triple Negative Breast Cancer tumors and computationally predict chemotherapy drug response in them.

Featured examples of my past work are available on this site; full details about my previous scholarship can be found in my CV. 

Recent Publications

Gut Microbiota: Small Molecules Modulate Host Cellular Functions
Jacob M. Luber and Aleksandar D. Kostic. 4/24/2017. “Gut Microbiota: Small Molecules Modulate Host Cellular Functions.” Current Biology , 27, 8, Pp. R307-R310. Publisher's VersionAbstract

The human gut metagenome was recently discovered to encode vast collections of biosynthetic gene clusters with diverse chemical potential, almost none of which are yet functionally validated. Recent work elucidates common microbiome-derived biosynthetic gene clusters encoding peptide aldehydes that inhibit human proteases.

Peter Kerpedjiev, Nezar Abdennur, Fritz Lekschas, Chuck McCallum, Kasper Dinkla, Hendrik Strobelt, Jacob M Luber, Scott Ouellette, Alaleh Azhir, Nikhil Kumar, Jeewon Hwang, Burak H. Alver, Hanspeter Pfister, Leonid A Mirny, Peter J. Park, and Nils Gehlenborg. Submitted. “HiGlass: Web-based Visual Comparison And Exploration Of Genome Interaction Maps.” bioRxiv, 121889. Publisher's VersionAbstract


We present HiGlass (, a web-based viewer for genome interaction maps featuring synchronized navigation of multiple views as well as continuous zooming and panning for navigation across genomic loci and resolutions. We demonstrate how visual comparison of Hi-C and other genomic data from different experimental conditions can be used to efficiently identify salient outcomes of experimental perturbations, generate new hypotheses, and share the results with the community.


Chao Fang, Huanzi Zhong, Yuxiang Lin, Bin Chen, Mo Han, Huahui Ren, Haorong Lu, Jacob Mayne Luber, Min Xia, Wangsheng Li, Shayna Stein, Xun Xu, Wenwei Zhang, Radoje Drmanac, Jian Wang, Huanming Yang, Lennart Hammarström, Aleksandar David Kostic, Karsten Kristiansen, and Junhua Li. 2017. “Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.” Gigascience.Abstract
Background: More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of two Illumina platforms. Findings: Using fecal samples from 20 healthy individuals we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency, was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform versus the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the two Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high quality reads (96.06% of raw reads) per sample with 90.56% of bases scoring Q30 and above was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02% -3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias towards genes with higher GC content being enriched on the HiSeq platforms. Conclusion: Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Job Dekker, Andrew S. Belmont, Mitchell Guttman, Victor O. Leshyk, John T. Lis, Stavros Lomvardas, Leonid A. Mirny, Clodagh C. O’Shea, Peter J. Park, Bing Ren, Joan C. Ritland Politz, Jay Shendure, Sheng Zhong, and The Nucleome 4D Network. 9/2017. “The 4D nucleome project.” Nature, 549, 7671, Pp. 219-226. Publisher's VersionAbstract
The 4D Nucleome Network aims to develop and apply approaches to map the structure and dynamics of the human and mouse genomes in space and time with the goal of gaining deeper mechanistic insights into how the nucleus is organized and functions. The project will develop and benchmark experimental and computational approaches for measuring genome conformation and nuclear organization, and investigate how these contribute to gene regulation and other genome functions. Validated experimental technologies will be combined with biophysical approaches to generate quantitative models of spatial genome organization in different biological states, both in cell populations and in single cells.
Jacob M. Luber, Braden T. Tierney, Evan M. Cofer, Chirag J. Patel, and Aleksandar D. Kostic. 12/8/2017. “Aether: Leveraging Linear Programming For Optimal Cloud Computing In Genomics.” Bioinformatics, btx787. Publisher's VersionAbstract


Across biology we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities.


Here, we present Aether (, an intuitive, easy-to-use, cost-effective, and scalable framework that uses linear programming (LP) to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines.


Data utilized are available at and with EBI SRA accession ERP005989. Source code is available at ( Examples, documentation, and a tutorial are available at (

Contact and

Jacob M Luber. 2016. “Improved Prediction of Mouse Pathways Related to Bone Maintenance Through Machine Learning Utilizing Diverse Genomic Data.” Trinity University Computer Science Honors Undergraduate Thesis.Abstract

The genetic cause of osteoporosis is poorly understood, but a wealth of functional genomic data exist from which osteoporosis related pathways could be identified. A machine learning pipeline was created using Support Vector Machines and was first applied using as inputs all available gene expression data and a second time using only bone-related data. In both cases, models were trained using a manually curated training set of gene relationships known to support bone maintenance and development. Each model was used to predict novel pairwise gene relationships, and specific pathways were compared between models to identify relationships supported primarily by data collected in bone-related contexts as opposed to other cellular contexts. Our results indicate a more accurate result was achieved through biologically-motivated feature selection that considers mammalian cellular context. Our results reinforce the observation that if two genes are functionally associated in one context they may not be functionally associated in all contexts, necessitating careful consideration of training sets and input data into functional prediction methods. 


Recent Posters & Talks

Exploring L1 and L2 Regularization Techniques For Combining Learning and Feature Selection on Cancer Imaging Data, at Massachusetts Institute of Technology , Monday, December 11, 2017

 We developed a statistical learning approach to predict head and neck cancer response to radiation therapy using PET medical imaging data. PET imaging looks at glucose metabolism, which allows cancer cells to proliferate, with the help of a tracer. Glucose metabolism is known to be highly indicative of treatment response and patient survival. We have utilized publically available clinical trial data from The Cancer Imaging Archive ( as our training data. Inherently, imaging data of this type has hundreds of potential features; however,...

Read more about Exploring L1 and L2 Regularization Techniques For Combining Learning and Feature Selection on Cancer Imaging Data