Li Y, Ning S, Calvo SE, Mootha VK, Liu JS. Bayesian hidden Markov tree models for clustering genes with shared evolutionary history. The Annals of Applied Statistics. 2019;13 (1) :606-637.
Li Y, Liu JS. Robust Variable and Interaction Selection for Logistic Regression and General Index Models. Journal of the American Statistical Association. 2019;114 (525) :271-286.
Li T, Kim A, Rosenbluh J, others. GeNets: a unified web platform for network-based genomic analyses. Nature Methods. 2018.
Gopal RK, Calvo SE, Shih AR, Chaves FL, McGuone D, Mick E, Pierce KA, Li Y, others. Early loss of mitochondrial complex I and rewiring of glutathione metabolism in renal oncocytoma. Proceedings of the National Academy of Sciences. 2018.
Lin Q, Li Y, Liu JS. Inverse modeling: an alleviation of non-linearity. In: Handbook of Big Data Analytics. Springer ; 2018.
Li Y, Jourdain AA, Calvo SE, Liu JS, Mootha VK. CLIC, a tool for expanding biological pathways based on co-expression across thousands of datasets. PLoS Computational Biology. 2017;13 (7) :e1005653.
Deng K, Li Y, Zhu W, Liu JS. Fast parameter estimation in loss tomography for networks of general topology. The Annals of Applied Statistics. 2016;10 (1) :144-164.
Li Y, Calvo SE, Gutman R, Liu JS, Mootha VK. Expansion of biological pathways based on evolutionary inference. Cell. 2014;158 (1) :213-225.Abstract

The availability of diverse genomes makes it possible to predict gene function based on shared evolutionary history. This approach can be challenging, however, for pathways whose components do not exhibit a shared history but rather consist of distinct “evolutionary modules.” We introduce a computational algorithm, clustering by inferred models of evolution (CLIME), which inputs a eukaryotic species tree, homology matrix, and pathway (gene set) of interest. CLIME partitions the gene set into disjoint evolutionary modules, simultaneously learning the number of modules and a tree-based evolutionary history that defines each module. CLIME then expands each module by scanning the genome for new components that likely arose under the inferred evolutionary model. Application of CLIME to ∼1,000 annotated human pathways and to the proteomes of yeast, red algae, and malaria reveals unanticipated evolutionary modularity and coevolving components. CLIME is freely available and should become increasingly powerful with the growing wealth of eukaryotic genomes.

Strittmatter L, Li Y, Nakatsuka NJ, Calvo SE, Grabarek Z, Mootha VK. CLYBL is a polymorphic human enzyme with malate synthase and β-methylmalate synthase activity. Human Molecular Genetics. 2013;23 (9) :2313-2323.Abstract

CLYBL is a human mitochondrial enzyme of unknown function that is found in multiple eukaryotic taxa and conserved to bacteria. The protein is expressed in the mitochondria of all mammalian organs, with highest expression in brown fat and kidney. Approximately 5% of all humans harbor a premature stop polymorphism in CLYBL that has been associated with reduced levels of circulating vitamin B12. Using comparative genomics, we now show that CLYBL is strongly co-expressed with and co-evolved specifically with other components of the mitochondrial B12 pathway. We confirm that the premature stop polymorphism in CLYBL leads to a loss of protein expression. To elucidate the molecular function of CLYBL, we used comparative operon analysis, structural modeling and enzyme kinetics. We report that CLYBL encodes a malate/β-methylmalate synthase, converting glyoxylate and acetyl-CoA to malate, or glyoxylate and propionyl-CoA to β-methylmalate. Malate synthases are best known for their established role in the glyoxylate shunt of plants and lower organisms and are traditionally described as not occurring in humans. The broader role of a malate/β-methylmalate synthase in human physiology and its mechanistic link to vitamin B12 metabolism remain unknown.

Deng K, Li Y, Zhu W, Geng Z, Liu JS. On delay tomography: fast algorithms and spatially dependent models. IEEE Transactions on Signal Processing. 2012;60 (11) :5685-5697.Abstract

As an active branch of network tomography, delay tomography has received considerable attentions in recent years. However, most methods in the literature assume that the delays of different links are independent of each other, and pursuit sub-optimal estimate instead of the maximum likelihood estimate (MLE) due to computational challenges. In this paper, we propose a novel method to implement the EM algorithm widely used in delay tomography analysis for multicast networks. The proposed method makes use of a “delay pattern database” to avoid all redundant computations in the E-step, and is much faster than the traditional implementation. With the help of this new implementation, finding MLE for large networks, which was considered impractical previously, becomes an easy task. Taking advantage of this computational breakthrough, we further consider models for potential spatial dependence of links, and propose a novel adaptive spatially dependent model (ASDM) for delay tomography. In ASDM, Markov dependence among nearby links is allowed, and spatially dependent links (SDLs) can be automatically recognized via model selection. The superiority of the new methods is confirmed by simulation studies.