Tumor-infiltrating B cells are an important component in the microenvironment but have unclear anti-tumor effects. We enhanced our previous computational algorithm TRUST to extract the B cell immunoglobulin hypervariable regions from bulk tumor RNA-sequencing data. TRUST assembled more than 30 million complementarity-determining region 3 sequences of the B cell heavy chain (IgH) from The Cancer Genome Atlas. Widespread B cell clonal expansions and immunoglobulin subclass switch events were observed in diverse human cancers. Prevalent somatic copy number alterations in the MICA and MICB genes related to antibody-dependent cell-mediated cytotoxicity were identified in tumors with elevated B cell activity. The IgG3-1 subclass switch interacts with B cell-receptor affinity maturation and defects in the antibody-dependent cell-mediated cytotoxicity pathway. Comprehensive pancancer analyses of tumor-infiltrating B cell-receptor repertoires identified novel tumor immune evasion mechanisms through genetic alterations. The IgH sequences identified here are potentially useful resources for future development of immunotherapies.
Cancer treatment by immune checkpoint blockade (ICB) can bring long-lasting clinical benefits, but only a fraction of patients respond to treatment. To predict ICB response, we developed TIDE, a computational method to model two primary mechanisms of tumor immune evasion: the induction of T cell dysfunction in tumors with high infiltration of cytotoxic T lymphocytes (CTL) and the prevention of T cell infiltration in tumors with low CTL level. We identified signatures of T cell dysfunction from large tumor cohorts by testing how the expression of each gene in tumors interacts with the CTL infiltration level to influence patient survival. We also modeled factors that exclude T cell infiltration into tumors using expression signatures from immunosuppressive cells. Using this framework and pre-treatment RNA-Seq or NanoString tumor expression profiles, TIDE predicted the outcome of melanoma patients treated with first-line anti-PD1 or anti-CTLA4 more accurately than other biomarkers such as PD-L1 level and mutation load. TIDE also revealed new candidate ICB resistance regulators, such as SERPINB9, demonstrating utility for immunotherapy research.
Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.
Estrogen receptor-positive (ER+) breast cancer is treated with endocrine therapies, although therapeutic resistance almost invariably develops in advanced disease. Using genome-wide CRISPR screens, we identified genes whose loss confers endocrine resistance, as well as synthetic lethal vulnerabilities to overcome such resistance. These findings reveal an estrogen-induced negative feedback loop that constrains the growth of ER+ tumors, thereby limiting the efficacy of therapies that inhibit ER, and suggest a previously unappreciated therapeutic route to overcoming endocrine resistance.Endocrine therapy resistance invariably develops in advanced estrogen receptor-positive (ER+) breast cancer, but the underlying mechanisms are largely unknown. We have identified C-terminal SRC kinase (CSK) as a critical node in a previously unappreciated negative feedback loop that limits the efficacy of current ER-targeted therapies. Estrogen directly drives CSK expression in ER+ breast cancer. At low CSK levels, as is the case in patients with ER+ breast cancer resistant to endocrine therapy and with the poorest outcomes, the p21 protein-activated kinase 2 (PAK2) becomes activated and drives estrogen-independent growth. PAK2 overexpression is also associated with endocrine therapy resistance and worse clinical outcome, and the combination of a PAK2 inhibitor with an ER antagonist synergistically suppressed breast tumor growth. Clinical approaches to endocrine therapy-resistant breast cancer must overcome the loss of this estrogen-induced negative feedback loop that normally constrains the growth of ER+ tumors.
Identifying reliable drug response biomarkers is a significant challenge in cancer research. We present computational analysis of resistance (CARE), a computational method focused on targeted therapies, to infer genome-wide transcriptomic signatures of drug efficacy from cell line compound screens. CARE outputs genome-scale scores to measure how the drug target gene interacts with other genes to affect the inhibitor efficacy in the compound screens. Such statistical interactions between drug targets and other genes were not considered in previous studies but are critical in identifying predictive biomarkers. When evaluated using transcriptome data from clinical studies, CARE can predict the therapy outcome better than signatures from other computational methods and genomics experiments. Moreover, the CARE signatures for the PLX4720 BRAF inhibitor are associated with an anti-programmed death 1 clinical response, suggesting a common efficacy signature between a targeted therapy and immunotherapy. When searching for genes related to lapatinib resistance, CARE identified PRKD3 as the top candidate. PRKD3 inhibition, by both small interfering RNA and compounds, significantly sensitized breast cancer cells to lapatinib. Thus, CARE should enable large-scale inference of response biomarkers and drug combinations for targeted therapies using compound screen data.
Many human cancers are resistant to immunotherapy, for reasons that are poorly understood. We used a genome-scale CRISPR-Cas9 screen to identify mechanisms of tumor cell resistance to killing by cytotoxic T cells, the central effectors of antitumor immunity. Inactivation of >100 genes-including,, and, which encode components of the PBAF form of the SWI/SNF chromatin remodeling complex-sensitized mouse B16F10 melanoma cells to killing by T cells. Loss of PBAF function increased tumor cell sensitivity to interferon-γ, resulting in enhanced secretion of chemokines that recruit effector T cells. Treatment-resistant tumors became responsive to immunotherapy whenwas inactivated. In many human cancers, expression ofandinversely correlated with expression of T cell cytotoxicity genes, and-deficient murine melanomas were more strongly infiltrated by cytotoxic T cells.
Competing endogenous RNAs (ceRNAs) are RNA molecules that sequester shared microRNAs (miRNAs) thereby affecting the expression of other targets of the miRNAs. Whether genetic variants in ceRNA can affect its biological function and disease development is still an open question. Here we identified a large number of genetic variants that are associated with ceRNA's function using Geuvaids RNA-seq data for 462 individuals from the 1000 Genomes Project. We call these loci competing endogenous RNA expression quantitative trait loci or 'cerQTL', and found that a large number of them were unexplored in conventional eQTL mapping. We identified many cerQTLs that have undergone recent positive selection in different human populations, and showed that single nucleotide polymorphisms in gene 3΄UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We also discovered that cerQTLs are significantly enriched in traits/diseases associated variants reported from genome-wide association studies in the miRNA binding sites, suggesting that disease susceptibilities could be attributed to ceRNA regulation. Further in vitro functional experiments demonstrated that a cerQTL rs11540855 can regulate ceRNA function. These results provide a comprehensive catalog of functional non-coding regulatory variants that may be responsible for ceRNA crosstalk at the post-transcriptional level.
BACKGROUND: Understanding the interactions between tumor and the host immune system is critical to finding prognostic biomarkers, reducing drug resistance, and developing new therapies. Novel computational methods are needed to estimate tumor-infiltrating immune cells and understand tumor-immune interactions in cancers. RESULTS: We analyze tumor-infiltrating immune cells in over 10,000 RNA-seq samples across 23 cancer types from The Cancer Genome Atlas (TCGA). Our computationally inferred immune infiltrates associate much more strongly with patient clinical features, viral infection status, and cancer genetic alterations than other computational approaches. Analysis of cancer/testis antigen expression and CD8 T-cell abundance suggests that MAGEA3 is a potential immune target in melanoma, but not in non-small cell lung cancer, and implicates SPAG5 as an alternative cancer vaccine target in multiple cancers. We find that melanomas expressing high levels of CTLA4 separate into two distinct groups with respect to CD8 T-cell infiltration, which might influence clinical responses to anti-CTLA4 agents. We observe similar dichotomy of TIM3 expression with respect to CD8 T cells in kidney cancer and validate it experimentally. The abundance of immune infiltration, together with our downstream analyses and findings, are accessible through TIMER, a public resource at http://cistrome.org/TIMER . CONCLUSIONS: We develop a computational approach to study tumor-infiltrating immune cells and their interactions with cancer cells. Our resource of immune-infiltrate levels, clinical associations, as well as predicted therapeutic markers may inform effective cancer vaccine and checkpoint blockade therapies.
Recent years have seen the rapid growth of large-scale biological data, but the effective mining and modeling of 'big data' for new biological discoveries remains a significant challenge. A new study reanalyzes expression profiles from the Gene Expression Omnibus to make novel discoveries about genes involved in DNA damage repair and genome instability in cancer.
Despite the rapid accumulation of tumor-profiling data and transcription factor (TF) ChIP-seq profiles, efforts integrating TF binding with the tumor-profiling data to understand how TFs regulate tumor gene expression are still limited. To systematically search for cancer-associated TFs, we comprehensively integrated 686 ENCODE ChIP-seq profiles representing 150 TFs with 7484 TCGA tumor data in 18 cancer types. For efficient and accurate inference on gene regulatory rules across a large number and variety of datasets, we developed an algorithm, RABIT (regression analysis with background integration). In each tumor sample, RABIT tests whether the TF target genes from ChIP-seq show strong differential regulation after controlling for background effect from copy number alteration and DNA methylation. When multiple ChIP-seq profiles are available for a TF, RABIT prioritizes the most relevant ChIP-seq profile in each tumor. In each cancer type, RABIT further tests whether the TF expression and somatic mutation variations are correlated with differential expression patterns of its target genes across tumors. Our predicted TF impact on tumor gene expression is highly consistent with the knowledge from cancer-related gene databases and reveals many previously unidentified aspects of transcriptional regulation in tumor progression. We also applied RABIT on RNA-binding protein motifs and found that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3'UTR regions. Thus, RABIT (rabit.dfci.harvard.edu) is a general platform for predicting the oncogenic role of gene expression regulators.
Many genomic techniques have been developed to study gene essentiality genome-wide, such as CRISPR and shRNA screens. Our analyses of public CRISPR screens suggest protein interaction networks, when integrated with gene expression or histone marks, are highly predictive of gene essentiality. Meanwhile, the quality of CRISPR and shRNA screen results can be significantly enhanced through network neighbor information. We also found network neighbor information to be very informative on prioritizing ChIP-seq target genes and survival indicator genes from tumor profiling. Thus, our study provides a general method for gene essentiality analysis in functional genomic experiments ( http://nest.dfci.harvard.edu ).
Combinatorial interplay among transcription factors (TFs) is an important mechanism by which transcriptional regulatory specificity is achieved. However, despite the increasing number of TFs for which either binding specificities or genome-wide occupancy data are known, knowledge about cooperativity between TFs remains limited. To address this, we developed a computational framework for predicting genome-wide co-binding between TFs (CCAT, Combinatorial Code Analysis Tool), and applied it to Drosophila melanogaster to uncover cooperativity among TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 19 to 58 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. We found that nearby binding sites for pairs of TFs predicted to cooperate were enriched in regions bound in relevant ChIP experiments, and were more evolutionarily conserved than other pairs. Further, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages. All generated data as well as source code for our front-to-end pipeline are available at http://cat.princeton.edu.
We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity.
An important mechanism to achieve regulatory specificity in diverse biological processes is through the combinatorial interplay between different regulators, such as amongst transcription factors (TFs) during transcriptional regulation or between RNA binding proteins (RBPs) and microRNAs (miRNAs) during transcript degradation control. To advance our understanding of combinatorial regulation, we developed a computational pipeline called CCAT (Combinatorial Code Analysis Tool) for predicting genome-wide co-binding between biological regulators.
In the first part of this thesis, we applied CCAT to the D. melanogaster genome to uncover cooperativity amongst TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 20 to 60 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. Several of the co-binding pairs we found correspond to TFs that are known to work together. Further, pairs of binding sites predicted to cooperate were found to be consistently enriched in their evolutionarily conservation and their tendency to be found in regions bound in relevant ChIP experiments. Finally, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages.
In the second part of this thesis, we applied CCAT to explore whether RBPs and miRNAs cooperate to promote transcript decay. We concentrated on five highly conserved RBP motifs in human 3'UTRs. A specific group of miRNA recognition sites were enriched within 50 nts from the RBP recognition sites for PUM and UAUUUAU. The presence of both a PUM recognition site and a recognition site for preferentially co-occurring miRNAs was associated with faster decay of the associated transcripts. For PUM and its co-occurring miRNAs, binding of the RBP to its recognition sites was predicted to release nearby miRNA recognition sites from RNA secondary structures. Overall, our CCAT analyses suggest that a specific set of RBPs and miRNAs work together to affect transcript decay, with the release of miRNA recognition sites via RBP binding as one possible model of cooperativity.
Our pipeline provides a general tool for identifying combinatorial cooperativity in biological regulation. All generated data as well as source code are available at: http://cat.princeton.edu.
Transcript degradation is a widespread and important mechanism for regulating protein abundance. Two major regulators of transcript degradation are RNA Binding Proteins (RBPs) and microRNAs (miRNAs). We computationally explored whether RBPs and miRNAs cooperate to promote transcript decay. We defined five RBP motifs based on the evolutionary conservation of their recognition sites in 3'UTRs as the binding motifs for Pumilio (PUM), U1A, Fox-1, Nova, and UAUUUAU. Recognition sites for some of these RBPs tended to localize at the end of long 3'UTRs. A specific group of miRNA recognition sites were enriched within 50 nts from the RBP recognition sites for PUM and UAUUUAU. The presence of both a PUM recognition site and a recognition site for preferentially co-occurring miRNAs was associated with faster decay of the associated transcripts. For PUM and its co-occurring miRNAs, binding of the RBP to its recognition sites was predicted to release nearby miRNA recognition sites from RNA secondary structures. The mammalian miRNAs that preferentially co-occur with PUM binding sites have recognition seeds that are reverse complements to the PUM recognition motif. Their binding sites have the potential to form hairpin secondary structures with proximal PUM binding sites that would normally limit RISC accessibility, but would be more accessible to miRNAs in response to the binding of PUM. In sum, our computational analyses suggest that a specific set of RBPs and miRNAs work together to affect transcript decay, with the rescue of miRNA recognition sites via RBP binding as one possible mechanism of cooperativity.
Ensuring the appropriate spatial-temporal control of protein abundance requires careful control of transcript levels. This process is regulated at many steps, including the rate at which transcripts decay. microRNAs (miRNAs) and RNA Binding Proteins (RBPs) represent two important regulators of transcript degradation. We review here recent literature that suggests these two regulators of transcript decay may functionally interact. Some studies have reported an excess of miRNA binding sites surrounding the positions at which RBPs bind. Experimental reports focusing on a particular transcript have identified instances in which RBPs and miRNAs compete for the same target sites, and instances in which the binding of a RBP makes a miRNA recognition site more accessible to the RISC complex. Further, miRNAs and RBPs use similar enzymes for degradation of target transcripts and the degradation of the target transcripts occurs in similar subcellular compartments. In addition to miRNA-RBP interactions involving transcript decay, RBPs have also been reported to facilitate the processing of pri-miRNAs to their final form. We summarize here several possible mechanisms through which miRNA-RBP interactions may occur.