The availability of diverse genomes makes it possible to predict gene function based on shared evolutionary history. This approach can be challenging, however, for pathways whose components do not exhibit a shared history but rather consist of distinct “evolutionary modules.” We introduce a computational algorithm, clustering by inferred models of evolution (CLIME), which inputs a eukaryotic species tree, homology matrix, and pathway (gene set) of interest. CLIME partitions the gene set into disjoint evolutionary modules, simultaneously learning the number of modules and a tree-based evolutionary history that defines each module. CLIME then expands each module by scanning the genome for new components that likely arose under the inferred evolutionary model. Application of CLIME to ∼1,000 annotated human pathways and to the proteomes of yeast, red algae, and malaria reveals unanticipated evolutionary modularity and coevolving components. CLIME is freely available and should become increasingly powerful with the growing wealth of eukaryotic genomes.
CLYBL is a human mitochondrial enzyme of unknown function that is found in multiple eukaryotic taxa and conserved to bacteria. The protein is expressed in the mitochondria of all mammalian organs, with highest expression in brown fat and kidney. Approximately 5% of all humans harbor a premature stop polymorphism in CLYBL that has been associated with reduced levels of circulating vitamin B12. Using comparative genomics, we now show that CLYBL is strongly co-expressed with and co-evolved specifically with other components of the mitochondrial B12 pathway. We confirm that the premature stop polymorphism in CLYBL leads to a loss of protein expression. To elucidate the molecular function of CLYBL, we used comparative operon analysis, structural modeling and enzyme kinetics. We report that CLYBL encodes a malate/β-methylmalate synthase, converting glyoxylate and acetyl-CoA to malate, or glyoxylate and propionyl-CoA to β-methylmalate. Malate synthases are best known for their established role in the glyoxylate shunt of plants and lower organisms and are traditionally described as not occurring in humans. The broader role of a malate/β-methylmalate synthase in human physiology and its mechanistic link to vitamin B12 metabolism remain unknown.
As an active branch of network tomography, delay tomography has received considerable attentions in recent years. However, most methods in the literature assume that the delays of different links are independent of each other, and pursuit sub-optimal estimate instead of the maximum likelihood estimate (MLE) due to computational challenges. In this paper, we propose a novel method to implement the EM algorithm widely used in delay tomography analysis for multicast networks. The proposed method makes use of a “delay pattern database” to avoid all redundant computations in the E-step, and is much faster than the traditional implementation. With the help of this new implementation, finding MLE for large networks, which was considered impractical previously, becomes an easy task. Taking advantage of this computational breakthrough, we further consider models for potential spatial dependence of links, and propose a novel adaptive spatially dependent model (ASDM) for delay tomography. In ASDM, Markov dependence among nearby links is allowed, and spatially dependent links (SDLs) can be automatically recognized via model selection. The superiority of the new methods is confirmed by simulation studies.