Many pressing medical challenges – such as diagnosing disease, enhancing directed stem cell differentiation, and classifying cancers – have long been hindered by limitations in our ability to quantify proteins in single cells. Mass-spectrometry (MS) is poised to transcend these limitations by developing powerful methods to routinely quantify thousands of proteins and proteoforms across many thousands of single cells. We outline specific technological developments and ideas that can increase the sensitivity and throughput of single cell MS by orders of magnitude and usher in this new age. These advances will transform medicine and ultimately contribute to understanding biological systems on an entirely new level.
During in vitro differentiation, pluripotent stem cells undergo extensive remodeling of their gene expression profile. While studied extensively at the transcriptome level, much less is known about protein dynamics. Here, we measured mRNA and protein levels of 7459 genes during differentiation of embryonic stem cells (ESCs). This comprehensive data set revealed pervasive discordance between mRNA and protein. The high temporal resolution of the data made it possible to determine protein turnover rates genome-wide by fitting a kinetic model. This model further enabled us to systematically identify dynamic post-transcriptional regulation. Moreover, we linked different modes of regulation to the function of specific gene sets. Finally, we showed that the kinetic model can be applied to single-cell transcriptomics data to predict protein levels in differentiated cell types. In conclusion, our comprehensive data set, easily accessible through a web application, is a valuable resource for the discovery of post-transcriptional regulation in ESC differentiation.
Many proteoforms - arising from alternative splicing, post-translational modifications (PTMs), or paralogous genes - have distinct biological functions, such as histone PTM proteoforms. However, their quantification by existing bottom-up mass-spectrometry (MS) methods is undermined by peptide-specific biases. To avoid these biases, we developed and implemented a first-principles model (HIquant) for quantifying proteoform stoichiometries. We characterized when MS data allow inferring proteoform stoichiometries by HIquant, derived an algorithm for optimal inference, and demonstrated experimentally high accuracy in quantifying fractional PTM occupancy without using external standards, even in the challenging case of the histone modification code. HIquant server is implemented at: https://web.northeastern.edu/slavov/2014_HIquant/
Author Summary The identity of human tissues depends on their protein levels. Are tissue protein levels set largely by corresponding mRNA levels or by other (post-transcriptional) regulatory mechanisms? We revisit this question based on statistical analysis of mRNA and protein levels measured across human tissues. We find that for any one gene, its protein levels across tissues are poorly predicted by its mRNA levels, suggesting tissue-specific post-transcriptional regulation. In contrast, the overall protein levels are well predicted by scaled mRNA levels. We show how these speciously contradictory findings are consistent with each other and represent the two sides of Simpson’s paradox.
Cellular heterogeneity is important to biological processes, including cancer and development. However, proteome heterogeneity is largely unexplored because of the limitations of existing methods for quantifying protein levels in single cells. To alleviate these limitations, we developed Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS), and validated its ability to identify distinct human cancer cell types based on their proteomes. We used SCoPE-MS to quantify over a thousand proteins in differentiating mouse embryonic stem (ES) cells. The single-cell proteomes enabled us to deconstruct cell populations and infer protein abundance relationships. Comparison between single-cell proteomes and transcriptomes indicated coordinated mRNA and protein covariation. Yet many genes exhibited functionally concerted and distinct regulatory patterns at the mRNA and the protein levels, suggesting that post-transcriptional regulatory mechanisms contribute to proteome remodeling during lineage specification, especially for developmental genes. SCoPE-MS is broadly applicable to measuring proteome configurations of single cells and linking them to functional phenotypes, such as cell type and differentiation potentials.
Understanding the regulation and structure of ribosomes is essential to understanding protein synthesis and its dysregulation in disease. While ribosomes are believed to have a fixed stoichiometry among their core ribosomal proteins (RPs), some experiments suggest a more variable composition. Testing such variability requires direct and precise quantification of RPs. We used mass-spectrometry to directly quantify RPs across monosomes and polysomes of mouse embryonic stem cells (ESC) and budding yeast. Our data show that the stoichiometry among core RPs in wild-type yeast cells and ESC depends both on the growth conditions and on the number of ribosomes bound per mRNA. Furthermore, we find that the fitness of cells with a deleted RP-gene is inversely proportional to the enrichment of the corresponding RP in polysomes. Together, our findings support the existence of ribosomes with distinct protein composition and physiological function.
Fermenting glucose in the presence of enough oxygen to support respiration, known as aerobic glycolysis, is believed to maximize growth rate. We observed increasing aerobic glycolysis during exponential growth, suggesting additional physiological roles for aerobic glycolysis. We investigated such roles in yeast batch cultures by quantifying O2 consumption, CO2 production, amino acids, mRNAs, proteins, posttranslational modifications, and stress sensitivity in the course of nine doublings at constant rate. During this course, the cells support a constant biomass-production rate with decreasing rates of respiration and ATP production but also decrease their stress resistance. As the respiration rate decreases, so do the levels of enzymes catalyzing rate-determining reactions of the tricarboxylic-acid cycle (providing NADH for respiration) and of mitochondrial folate-mediated NADPH production (required for oxidative defense). The findings demonstrate that exponential growth can represent not a single metabolic/physiological state but a continuum of changing states and that aerobic glycolysis can reduce the energy demands associated with respiratory metabolism and stress survival.
We study the total least squares (TLS) problem that generalizes least squares regression by allowing measurement errors in both dependent and independent variables. TLS is widely used in applied fields including computer vision, system identification and econometrics. The special case when all dependent and independent variables have the same level of uncorrelated Gaussian noise, known as ordinary TLS, can be solved by singular value decomposition (SVD). However, SVD cannot solve many important practical TLS problems with realistic noise structure, such as having varying measurement noise, known structure on the errors, or large outliers requiring robust error-norms. To solve such problems, we develop convex relaxation approaches for a general class of structured TLS (STLS). We show both theoretically and experimentally, that while the plain nuclear norm relaxation incurs large approximation errors for STLS, the re-weighted nuclear norm approach is very effective, and achieves better accuracy on challenging STLS problems than popular non-convex solvers. We describe a fast solution based on augmented Lagrangian formulation, and apply our approach to an important class of biological problems that use population average measurements to infer cell-type and physiological-state specific expression levels that are very hard to measure directly
To survive and proliferate, cells need to coordinate their metabolism, gene expression, and cell division. To understand this coordination and the consequences of its failure, we uncoupled biomass synthesis from nutrient signaling by growing, in chemostats, yeast auxotrophs for histidine, lysine, or uracil in excess of natural nutrients (i.e., sources of carbon, nitrogen, sulfur, and phosphorus), such that their growth rates (GRs) were regulated by the availability of their auxotrophic requirements. The physiological and transcriptional responses to GR changes of these cultures differed markedly from the respective responses of prototrophs whose growth-rate is regulated by the availability of natural nutrients. The data for all auxotrophs at all GRs recapitulated the features of aerobic glycolysis, fermentation despite high oxygen levels in the growth media. In addition, we discovered wide bimodal distributions of cell sizes, indicating a decoupling between the cell division cycle (CDC) and biomass production. The aerobic glycolysis was reflected in a general signature of anaerobic growth, including substantial reduction in the expression levels of mitochondrial and tricarboxylic acid genes. We also found that the magnitude of the transcriptional growth-rate response (GRR) in the auxotrophs is only 40�50% of the magnitude in prototrophs. Furthermore, the auxotrophic cultures express autophagy genes at substantially lower levels, which likely contributes to their lower viability. Our observations suggest that a GR signal, which is a function of the abundance of essential natural nutrients, regulates fermentation/respiration, the GRR, and the CDC.
The respiratory metabolic cycle in budding yeast (Saccharomyces cerevisiae) consists of two phases that are most simply defined phenomenologically: low oxygen consumption (LOC) and high oxygen consumption (HOC). Each phase is associated with the periodic expression of thousands of genes, producing oscillating patterns of gene expression found in synchronized cultures and in single cells of slowly growing unsynchronized cultures. Systematic variation in the durations of the HOC and LOC phases can account quantitatively for well-studied transcriptional responses to growth rate differences. Here we show that a similar mechanism-transitions from the HOC phase to the LOC phase-can account for much of the common environmental stress response (ESR) and for the cross-protection by a preliminary heat stress (or slow growth rate) to subsequent lethal heat stress. Similar to the budding yeast metabolic cycle, we suggest that a metabolic cycle, coupled in a similar way to the ESR, in the distantly related fission yeast, Schizosaccharomyces pombe, and in humans can explain gene expression and respiratory patterns observed in these eukaryotes. Although metabolic cycling is associated with the G0/G1 phase of the cell division cycle of slowly growing budding yeast, transcriptional cycling was detected in the G2 phase of the division cycle in fission yeast, consistent with the idea that respiratory metabolic cycling occurs during the phases of the cell division cycle associated with mass accumulation in these divergent eukaryotes.
I still remember very clearly the key reason behind my decision to attend MIT about a decade ago. It was a statement that set MIT apart from the other top schools. On one of the MIT webpages, I read that an MIT education is a calling, about understanding nature and not about building a career. Throughout my time at MIT, both as an undergrad and as a postdoc, I have seen many examples to support this mission statement that have always made MIT special for me.
Recently, however, I have been hearing more voices of an alternative culture; one puts career first and science second. I often hear my colleagues being more concerned about “spinning” and “selling” a paper rather than about understanding nature. I hear MIT students and postdocs for whom the “impact factor” (IF) of the magazine/journal in which they publish is more important than the substance of what they publish. This worship of the IF, computed and published by the Thomson corporation, is particularly odd for scientists given the methods of computing the IF, and particularly out of place at MIT (see http://jcb.rupress.org/content/179/6/1091.full for more information). I still believe that MIT is a special place; I have met too many students and faculty passionate about science to think otherwise. Yet, I also think that we as a community should make a concerted effort to counteract the cancerous spread of the IF worship and preserve what makes MIT special. The personal example of the senior members of the community who put science first can be a particularly effective and inspiring part of such an effort. I know from personal experience because I have benefited tremendously from the example of my mentors.
This emphasis on the IF can be seen as a particular example of the general trend of decoupling merit from social reward. Such decoupling is rather widespread in all realms of life, whether actively fostered by specious advertising or passively allowed by hiring and promotional committees focusing excessively on the IF. The decoupling is perhaps more common in business than in science, perhaps more common in other academic institutions than at MIT. Yet, I find it particularly unacceptable in science and completely incongruous with MIT’s culture and mission.
We studied the steady-state responses to changes in growth rate of yeast when ethanol is the sole source of carbon and energy. Analysis of these data, together with data from studies where glucose was the carbon source, allowed us to distinguish a "universal" growth rate response (GRR) common to all media studied from a GRR specific to the carbon source. Genes with positive universal GRR include ribosomal, translation, and mitochondrial genes, and those with negative GRR include autophagy, vacuolar, and stress response genes. The carbon source-specific GRR genes control mitochondrial function, peroxisomes, and synthesis of vitamins and cofactors, suggesting this response may reflect the intensity of oxidative metabolism. All genes with universal GRR, which comprise 25% of the genome, are expressed periodically in the yeast metabolic cycle (YMC). We propose that the universal GRR may be accounted for by changes in the relative durations of the YMC phases. This idea is supported by oxygen consumption data from metabolically synchronized cultures with doubling times ranging from 5 to 14 h. We found that the high oxygen consumption phase of the YMC can coincide exactly with the S phase of the cell division cycle, suggesting that oxidative metabolism and DNA replication are not incompatible.
Despite rapid progress in characterizing the yeast metabolic cycle, its connection to the cell division cycle (CDC) has remained unclear. We discovered that a prototrophic batch culture of budding yeast, growing in a phosphate-limited ethanol medium, synchronizes spontaneously and goes through multiple metabolic cycles, whereas the fraction of cells in the G1/G0 phase of the CDC increases monotonically from 90 to 99%. This demonstrates that metabolic cycling does not require cell division cycling and that metabolic synchrony does not require carbon-source limitation. More than 3,000 genes, including most genes annotated to the CDC, were expressed periodically in our batch culture, albeit a mere 10% of the cells divided asynchronously; only a smaller subset of CDC genes correlated with cell division. These results suggest that the yeast metabolic cycle reflects a growth cycle during G1/G0 and explains our previous puzzling observation that genes annotated to the CDC increase in expression at slow growth.
Networks are becoming a unifying framework for modeling complex systems and network inference problems are frequently encountered in many fields. Here, I develop and apply a generative approach to network inference (RCweb) for the case when the network is sparse and the latent (not observed) variables affect the observed ones. From all possible factor analysis (FA) decompositions explaining the variance in the data, RCweb selects the FA decomposition that is consistent with a sparse underlying network. The sparsity constraint is imposed by a novel method that significantly outperforms (in terms of accuracy, robustness to noise, complexity scaling and computational efficiency) methods using l1 norm relaxation such as K-SVD and l1-based sparse principle component analysis (PCA). Results from simulated models demonstrate that RCweb recovers exactly the model structures for sparsity as low (as non-sparse) as 50% and with ratio of unobserved to observed variables as high as 2. RCweb is robust to noise, with gradual decrease in the parameter ranges as the noise level increases.
slavovLab@JohnRYatesIII@TheScientistLLC I agree, Academia gives the most family flexibility. Also, the things that I dislike about Academia seem to be a part of just about any other exciting and influential career option.
I am happy with my choices and not worried about myself.