Jing Zhang #, Donghoon Lee #, Vineet Dhiman #, Peng Jiang #, Xiaole Shirley Liu, Kevin White, Mark Gerstein, and ENCODE consortium. Submitted. “An integrative ENCODE resource for cancer: interpreting non-coding mutations and gene regulation.” Nature.
Xihao Hu, Jian Zhang, Jin Wang, Jingxin Fu, Taiwen Li, Xiaoqi Zheng, Binbin Wang, Shengqing Gu, Peng Jiang, Jingyu Fan, Xiaomin Ying, Jing Zhang, Michael C Carroll, Kai W Wucherpfennig, Nir Hacohen, Fan Zhang, Peng Zhang, Jun S Liu, Bo Li, and X. Shirley Liu. 2019. “Landscape of B cell immunity and related immune evasion in human cancers.” Nat Genet, 51, 3, Pp. 560-567.Abstract
Tumor-infiltrating B cells are an important component in the microenvironment but have unclear anti-tumor effects. We enhanced our previous computational algorithm TRUST to extract the B cell immunoglobulin hypervariable regions from bulk tumor RNA-sequencing data. TRUST assembled more than 30 million complementarity-determining region 3 sequences of the B cell heavy chain (IgH) from The Cancer Genome Atlas. Widespread B cell clonal expansions and immunoglobulin subclass switch events were observed in diverse human cancers. Prevalent somatic copy number alterations in the MICA and MICB genes related to antibody-dependent cell-mediated cytotoxicity were identified in tumors with elevated B cell activity. The IgG3-1 subclass switch interacts with B cell-receptor affinity maturation and defects in the antibody-dependent cell-mediated cytotoxicity pathway. Comprehensive pancancer analyses of tumor-infiltrating B cell-receptor repertoires identified novel tumor immune evasion mechanisms through genetic alterations. The IgH sequences identified here are potentially useful resources for future development of immunotherapies.
Peng Jiang #, Shengqing Gu #, Deng Pan #, Jingxin Fu, Avinash Sahu, Xihao Hu, Ziyi Li, Nicole Traugh, Xia Bu, Bo Li, Jun Liu, Gordon J Freeman, Myles A Brown, Kai W Wucherpfennig, and X. Shirley Liu. 8/20/2018. “Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response.” Nature Medicine, 24, 20, Pp. 1550-1558.Abstract
Cancer treatment by immune checkpoint blockade (ICB) can bring long-lasting clinical benefits, but only a fraction of patients respond to treatment. To predict ICB response, we developed TIDE, a computational method to model two primary mechanisms of tumor immune evasion: the induction of T cell dysfunction in tumors with high infiltration of cytotoxic T lymphocytes (CTL) and the prevention of T cell infiltration in tumors with low CTL level. We identified signatures of T cell dysfunction from large tumor cohorts by testing how the expression of each gene in tumors interacts with the CTL infiltration level to influence patient survival. We also modeled factors that exclude T cell infiltration into tumors using expression signatures from immunosuppressive cells. Using this framework and pre-treatment RNA-Seq or NanoString tumor expression profiles, TIDE predicted the outcome of melanoma patients treated with first-line anti-PD1 or anti-CTLA4 more accurately than other biomarkers such as PD-L1 level and mutation load. TIDE also revealed new candidate ICB resistance regulators, such as SERPINB9, demonstrating utility for immunotherapy research.
Peng Jiang, William R. Sellers, and X. Shirley Liu. 7/20/2018. “Big Data Approaches for Modeling Response and Resistance to Cancer Drugs.” Annual Review of Biomedical Data Science, 1, 1, Pp. 1-27. Publisher's VersionAbstract
Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.
Tengfei Xiao, Wei Li, Xiaoqing Wang, Han Xu, Jixin Yang, Qiu Wu, Ying Huang, Joseph Geradts, Peng Jiang, Teng Fei, David Chi, Chongzhi Zang, Qi Liao, Jonathan Rennhack, Eran Andrechek, Nanlin Li, Simone Detre, Mitchell Dowsett, Rinath M. Jeselsohn, X. Shirley Liu, and Myles Brown. 7/9/2018. “Estrogen-regulated feedback loop limits the efficacy of estrogen receptor–targeted breast cancer therapy.” Proceedings of the National Academy of Sciences. Publisher's VersionAbstract
Estrogen receptor-positive (ER+) breast cancer is treated with endocrine therapies, although therapeutic resistance almost invariably develops in advanced disease. Using genome-wide CRISPR screens, we identified genes whose loss confers endocrine resistance, as well as synthetic lethal vulnerabilities to overcome such resistance. These findings reveal an estrogen-induced negative feedback loop that constrains the growth of ER+ tumors, thereby limiting the efficacy of therapies that inhibit ER, and suggest a previously unappreciated therapeutic route to overcoming endocrine resistance.Endocrine therapy resistance invariably develops in advanced estrogen receptor-positive (ER+) breast cancer, but the underlying mechanisms are largely unknown. We have identified C-terminal SRC kinase (CSK) as a critical node in a previously unappreciated negative feedback loop that limits the efficacy of current ER-targeted therapies. Estrogen directly drives CSK expression in ER+ breast cancer. At low CSK levels, as is the case in patients with ER+ breast cancer resistant to endocrine therapy and with the poorest outcomes, the p21 protein-activated kinase 2 (PAK2) becomes activated and drives estrogen-independent growth. PAK2 overexpression is also associated with endocrine therapy resistance and worse clinical outcome, and the combination of a PAK2 inhibitor with an ER antagonist synergistically suppressed breast tumor growth. Clinical approaches to endocrine therapy-resistant breast cancer must overcome the loss of this estrogen-induced negative feedback loop that normally constrains the growth of ER+ tumors.
Chen-Hao Chen, Tengfei Xiao, Han Xu, Peng Jiang, Clifford A Meyer, Wei Li, Myles Brown, and X. Shirley Liu. 6/1/2018. “Improved design and analysis of CRISPR knockout screens.” Bioinformatics, Pp. bty450. Publisher's Version manuscript_MAGECK-NEST.pdf
Peng Jiang, Winston Lee, Xujuan Li, Carl Johnson, Jun S Liu, Myles Brown, Jon Christopher Aster, and X. Shirley Liu. 2018. “Genome-Scale Signatures of Gene Interaction from Compound Screens Predict Clinical Efficacy of Targeted Cancer Therapies.” Cell Systems, 6, 3, Pp. 343-354. Publisher's VersionAbstract
Identifying reliable drug response biomarkers is a significant challenge in cancer research. We present computational analysis of resistance (CARE), a computational method focused on targeted therapies, to infer genome-wide transcriptomic signatures of drug efficacy from cell line compound screens. CARE outputs genome-scale scores to measure how the drug target gene interacts with other genes to affect the inhibitor efficacy in the compound screens. Such statistical interactions between drug targets and other genes were not considered in previous studies but are critical in identifying predictive biomarkers. When evaluated using transcriptome data from clinical studies, CARE can predict the therapy outcome better than signatures from other computational methods and genomics experiments. Moreover, the CARE signatures for the PLX4720 BRAF inhibitor are associated with an anti-programmed death 1 clinical response, suggesting a common efficacy signature between a targeted therapy and immunotherapy. When searching for genes related to lapatinib resistance, CARE identified PRKD3 as the top candidate. PRKD3 inhibition, by both small interfering RNA and compounds, significantly sensitized breast cancer cells to lapatinib. Thus, CARE should enable large-scale inference of response biomarkers and drug combinations for targeted therapies using compound screen data.
Deng Pan #, Aya Kobayashi #, Peng Jiang #, Lucas Ferrari de Andrade, Rong En Tay, Adrienne M Luoma, Daphne Tsoucas, Xintao Qiu, Klothilda Lim, Prakash Rao, Henry W Long, Guo-Cheng Yuan, John Doench, Myles Brown, X. Shirley Liu, and Kai W Wucherpfennig. 2018. “A major chromatin regulator determines resistance of tumor cells to T cell-mediated killing.” Science, 359, 6377, Pp. 770-775.Abstract
Many human cancers are resistant to immunotherapy, for reasons that are poorly understood. We used a genome-scale CRISPR-Cas9 screen to identify mechanisms of tumor cell resistance to killing by cytotoxic T cells, the central effectors of antitumor immunity. Inactivation of >100 genes-including,, and, which encode components of the PBAF form of the SWI/SNF chromatin remodeling complex-sensitized mouse B16F10 melanoma cells to killing by T cells. Loss of PBAF function increased tumor cell sensitivity to interferon-γ, resulting in enhanced secretion of chemokines that recruit effector T cells. Treatment-resistant tumors became responsive to immunotherapy whenwas inactivated. In many human cancers, expression ofandinversely correlated with expression of T cell cytotoxicity genes, and-deficient murine melanomas were more strongly infiltrated by cytotoxic T cells.
Shenglin Mei, Clifford A Meyer, Rongbin Zheng, Qian Qin, Qiu Wu, Peng Jiang, Bo Li, Xiaohui Shi, Binbin Wang, Jingyu Fan, Celina Shih, Myles Brown, Chongzhi Zang, and X. Shirley Liu. 2017. “Cistrome Cancer: A Web Resource for Integrative Gene Regulation Modeling in Cancer.” Cancer Res, 77, 21, Pp. e19-e22.Abstract
Cancer results from a breakdown of normal gene expression control, so the study of gene regulation is critical to cancer research. To gain insight into the transcriptional and epigenetic factors regulating abnormal gene expression patterns in cancers, we developed the Cistrome Cancer web resource ( We conducted the systematic integration and modeling of over 10,000 tumor molecular profiles from The Cancer Genome Atlas (TCGA) with over 23,000 ChIP-seq and chromatin accessibility profiles from our Cistrome collection. The results include reconstruction of functional enhancer profiles, "super-enhancer" target genes, as well as predictions of active transcription factors and their target genes for each TCGA cancer type. Cistrome Cancer reveals novel insights from integrative analyses combining chromatin profiles with tumor molecular profiles and will be a useful resource to the cancer gene regulation community. Cancer Res; 77(21); e19-22. ©2017 AACR.
Mulin Jun Li, Jian Zhang, Qian Liang, Chenghao Xuan, Jiexing Wu, Peng Jiang, Wei Li, Yun Zhu, Panwen Wang, Daniel Fernandez, Yujun Shen, Yiwen Chen, Jean-Pierre A Kocher, Ying Yu, Pak Chung Sham, Junwen Wang, Jun S Liu, and X. Shirley Liu. 2017. “Exploring genetic associations with ceRNA regulation in the human genome.” Nucleic Acids Res, 45, 10, Pp. 5653-5665.Abstract
Competing endogenous RNAs (ceRNAs) are RNA molecules that sequester shared microRNAs (miRNAs) thereby affecting the expression of other targets of the miRNAs. Whether genetic variants in ceRNA can affect its biological function and disease development is still an open question. Here we identified a large number of genetic variants that are associated with ceRNA's function using Geuvaids RNA-seq data for 462 individuals from the 1000 Genomes Project. We call these loci competing endogenous RNA expression quantitative trait loci or 'cerQTL', and found that a large number of them were unexplored in conventional eQTL mapping. We identified many cerQTLs that have undergone recent positive selection in different human populations, and showed that single nucleotide polymorphisms in gene 3΄UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We also discovered that cerQTLs are significantly enriched in traits/diseases associated variants reported from genome-wide association studies in the miRNA binding sites, suggesting that disease susceptibilities could be attributed to ceRNA regulation. Further in vitro functional experiments demonstrated that a cerQTL rs11540855 can regulate ceRNA function. These results provide a comprehensive catalog of functional non-coding regulatory variants that may be responsible for ceRNA crosstalk at the post-transcriptional level.
Bo Li, Eric Severson, Jean-Christophe Pignon, Haoquan Zhao, Taiwen Li, Jesse Novak, Peng Jiang, Hui Shen, Jon C Aster, Scott Rodig, Sabina Signoretti, Jun S Liu, and X. Shirley Liu. 2016. “Comprehensive analyses of tumor immunity: implications for cancer immunotherapy.” Genome Biol, 17, 1, Pp. 174.Abstract
BACKGROUND: Understanding the interactions between tumor and the host immune system is critical to finding prognostic biomarkers, reducing drug resistance, and developing new therapies. Novel computational methods are needed to estimate tumor-infiltrating immune cells and understand tumor-immune interactions in cancers. RESULTS: We analyze tumor-infiltrating immune cells in over 10,000 RNA-seq samples across 23 cancer types from The Cancer Genome Atlas (TCGA). Our computationally inferred immune infiltrates associate much more strongly with patient clinical features, viral infection status, and cancer genetic alterations than other computational approaches. Analysis of cancer/testis antigen expression and CD8 T-cell abundance suggests that MAGEA3 is a potential immune target in melanoma, but not in non-small cell lung cancer, and implicates SPAG5 as an alternative cancer vaccine target in multiple cancers. We find that melanomas expressing high levels of CTLA4 separate into two distinct groups with respect to CD8 T-cell infiltration, which might influence clinical responses to anti-CTLA4 agents. We observe similar dichotomy of TIM3 expression with respect to CD8 T cells in kidney cancer and validate it experimentally. The abundance of immune infiltration, together with our downstream analyses and findings, are accessible through TIMER, a public resource at . CONCLUSIONS: We develop a computational approach to study tumor-infiltrating immune cells and their interactions with cancer cells. Our resource of immune-infiltrate levels, clinical associations, as well as predicted therapeutic markers may inform effective cancer vaccine and checkpoint blockade therapies.
Peng Jiang and X. Shirley Liu. 2015. “Big data mining yields novel insights on cancer.” Nat Genet, 47, 2, Pp. 103-4.Abstract
Recent years have seen the rapid growth of large-scale biological data, but the effective mining and modeling of 'big data' for new biological discoveries remains a significant challenge. A new study reanalyzes expression profiles from the Gene Expression Omnibus to make novel discoveries about genes involved in DNA damage repair and genome instability in cancer.
Peng Jiang, Matthew L Freedman, Jun S Liu, and Xiaole Shirley Liu. 2015. “Inference of transcriptional regulation in cancers.” Proc Natl Acad Sci U S A, 112, 25, Pp. 7731-6.Abstract
Despite the rapid accumulation of tumor-profiling data and transcription factor (TF) ChIP-seq profiles, efforts integrating TF binding with the tumor-profiling data to understand how TFs regulate tumor gene expression are still limited. To systematically search for cancer-associated TFs, we comprehensively integrated 686 ENCODE ChIP-seq profiles representing 150 TFs with 7484 TCGA tumor data in 18 cancer types. For efficient and accurate inference on gene regulatory rules across a large number and variety of datasets, we developed an algorithm, RABIT (regression analysis with background integration). In each tumor sample, RABIT tests whether the TF target genes from ChIP-seq show strong differential regulation after controlling for background effect from copy number alteration and DNA methylation. When multiple ChIP-seq profiles are available for a TF, RABIT prioritizes the most relevant ChIP-seq profile in each tumor. In each cancer type, RABIT further tests whether the TF expression and somatic mutation variations are correlated with differential expression patterns of its target genes across tumors. Our predicted TF impact on tumor gene expression is highly consistent with the knowledge from cancer-related gene databases and reveals many previously unidentified aspects of transcriptional regulation in tumor progression. We also applied RABIT on RNA-binding protein motifs and found that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3'UTR regions. Thus, RABIT ( is a general platform for predicting the oncogenic role of gene expression regulators.
Peng Jiang, Hongfang Wang, Wei Li, Chongzhi Zang, Bo Li, Yinling J Wong, Cliff Meyer, Jun S Liu, Jon C Aster, and X. Shirley Liu. 2015. “Network analysis of gene essentiality in functional genomics experiments.” Genome Biol, 16, Pp. 239.Abstract
Many genomic techniques have been developed to study gene essentiality genome-wide, such as CRISPR and shRNA screens. Our analyses of public CRISPR screens suggest protein interaction networks, when integrated with gene expression or histone marks, are highly predictive of gene essentiality. Meanwhile, the quality of CRISPR and shRNA screen results can be significantly enhanced through network neighbor information. We also found network neighbor information to be very informative on prioritizing ChIP-seq target genes and survival indicator genes from tumor profiling. Thus, our study provides a general method for gene essentiality analysis in functional genomic experiments ( ).
Peng Jiang and Mona Singh. 2014. “CCAT: Combinatorial Code Analysis Tool for transcriptional regulation.” Nucleic Acids Res, 42, 5, Pp. 2833-47.Abstract
Combinatorial interplay among transcription factors (TFs) is an important mechanism by which transcriptional regulatory specificity is achieved. However, despite the increasing number of TFs for which either binding specificities or genome-wide occupancy data are known, knowledge about cooperativity between TFs remains limited. To address this, we developed a computational framework for predicting genome-wide co-binding between TFs (CCAT, Combinatorial Code Analysis Tool), and applied it to Drosophila melanogaster to uncover cooperativity among TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 19 to 58 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. We found that nearby binding sites for pairs of TFs predicted to cooperate were enriched in regions bound in relevant ChIP experiments, and were more evolutionarily conserved than other pairs. Further, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages. All generated data as well as source code for our front-to-end pipeline are available at
Xiaoqi Zheng, Qian Zhao, Hua-Jun Wu, Wei Li, Haiyun Wang, Clifford A Meyer, Qian Alvin Qin, Han Xu, Chongzhi Zang, Peng Jiang, Fuqiang Li, Yong Hou, Jianxing He, Jun Wang, Jun Wang, Peng Zhang, Yong Zhang, and Xiaole Shirley Liu. 2014. “MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes.” Genome Biol, 15, 8, Pp. 419.Abstract
We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity.
Peng Jiang. 9/2013. “Combinatorial code analysis for understanding biological regulation.” Ph.D. Dissertation.Abstract

An important mechanism to achieve regulatory specificity in diverse biological processes is through the combinatorial interplay between different regulators, such as amongst transcription factors (TFs) during transcriptional regulation or between RNA binding proteins (RBPs) and microRNAs (miRNAs) during transcript degradation control. To advance our understanding of combinatorial regulation, we developed a computational pipeline called CCAT (Combinatorial Code Analysis Tool) for predicting genome-wide co-binding between biological regulators.

In the first part of this thesis, we applied CCAT to the D. melanogaster genome to uncover cooperativity amongst TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 20 to 60 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. Several of the co-binding pairs we found correspond to TFs that are known to work together. Further, pairs of binding sites predicted to cooperate were found to be consistently enriched in their evolutionarily conservation and their tendency to be found in regions bound in relevant ChIP experiments. Finally, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages.

In the second part of this thesis, we applied CCAT to explore whether RBPs and miRNAs cooperate to promote transcript decay. We concentrated on five highly conserved RBP motifs in human 3'UTRs. A specific group of miRNA recognition sites were enriched within 50 nts from the RBP recognition sites for PUM and UAUUUAU. The presence of both a PUM recognition site and a recognition site for preferentially co-occurring miRNAs was associated with faster decay of the associated transcripts. For PUM and its co-occurring miRNAs, binding of the RBP to its recognition sites was predicted to release nearby miRNA recognition sites from RNA secondary structures. Overall, our CCAT analyses suggest that a specific set of RBPs and miRNAs work together to affect transcript decay, with the release of miRNA recognition sites via RBP binding as one possible model of cooperativity.

Our pipeline provides a general tool for identifying combinatorial cooperativity in biological regulation. All generated data as well as source code are available at:

Peng Jiang, Mona Singh, and Hilary A Coller. 2013. “Computational assessment of the cooperativity between RNA binding proteins and MicroRNAs in Transcript Decay.” PLoS Comput Biol, 9, 5, Pp. e1003075.Abstract
Transcript degradation is a widespread and important mechanism for regulating protein abundance. Two major regulators of transcript degradation are RNA Binding Proteins (RBPs) and microRNAs (miRNAs). We computationally explored whether RBPs and miRNAs cooperate to promote transcript decay. We defined five RBP motifs based on the evolutionary conservation of their recognition sites in 3'UTRs as the binding motifs for Pumilio (PUM), U1A, Fox-1, Nova, and UAUUUAU. Recognition sites for some of these RBPs tended to localize at the end of long 3'UTRs. A specific group of miRNA recognition sites were enriched within 50 nts from the RBP recognition sites for PUM and UAUUUAU. The presence of both a PUM recognition site and a recognition site for preferentially co-occurring miRNAs was associated with faster decay of the associated transcripts. For PUM and its co-occurring miRNAs, binding of the RBP to its recognition sites was predicted to release nearby miRNA recognition sites from RNA secondary structures. The mammalian miRNAs that preferentially co-occur with PUM binding sites have recognition seeds that are reverse complements to the PUM recognition motif. Their binding sites have the potential to form hairpin secondary structures with proximal PUM binding sites that would normally limit RISC accessibility, but would be more accessible to miRNAs in response to the binding of PUM. In sum, our computational analyses suggest that a specific set of RBPs and miRNAs work together to affect transcript decay, with the rescue of miRNA recognition sites via RBP binding as one possible mechanism of cooperativity.
Peng Jiang and Hilary Coller. 2012. “Functional interactions between microRNAs and RNA binding proteins.” Microrna, 1, 1, Pp. 70-9.Abstract
Ensuring the appropriate spatial-temporal control of protein abundance requires careful control of transcript levels. This process is regulated at many steps, including the rate at which transcripts decay. microRNAs (miRNAs) and RNA Binding Proteins (RBPs) represent two important regulators of transcript degradation. We review here recent literature that suggests these two regulators of transcript decay may functionally interact. Some studies have reported an excess of miRNA binding sites surrounding the positions at which RBPs bind. Experimental reports focusing on a particular transcript have identified instances in which RBPs and miRNAs compete for the same target sites, and instances in which the binding of a RBP makes a miRNA recognition site more accessible to the RISC complex. Further, miRNAs and RBPs use similar enzymes for degradation of target transcripts and the degradation of the target transcripts occurs in similar subcellular compartments. In addition to miRNA-RBP interactions involving transcript decay, RBPs have also been reported to facilitate the processing of pri-miRNAs to their final form. We summarize here several possible mechanisms through which miRNA-RBP interactions may occur.