Abhijeet R Sonawane, John Platig, Maud Fagny, Cho-Yi Chen, Joseph N Paulson, Camila M Lopes-Ramos, Dawn L DeMeo, John Quackenbush, Kimberly Glass, and Marieke L Kuijjer. 10/24/2017. “Understanding Tissue-specific Gene Regulation.” Cell Reports, 21, 4, Pp. 1077-1088. DOIAbstract

Although all human tissues carry out common processes, tissues are distinguished by gene expression patterns, implying that distinct regulatory programs control tissue specificity. In this study, we investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Expression project. We find that network edges (transcription factor to target gene connections) have higher tissue specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that the regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue network. However, they do assume bottleneck positions due to variability in transcription factor targeting and the influence of non-canonical regulatory interactions. These results suggest that tissue specificity is driven by context-dependent regulatory paths, providing transcriptional control of tissue-specific processes.

Joseph Paulson, Cho-Yi Chen, Camila M Lopes-Ramos, Marieke L Kuijjer, John Platig, Abhijeet R Sonawane, Maud Fagny, Kimberly Glass, and John Quackenbush. 10/3/2017. “Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.” BMC Bioinformatics, 18, Pp. 437. DOIAbstract



Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data – critical first steps for any subsequent analysis.


We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project.


An R package instantiating YARN is available at


Camila M Lopes-Ramos*, Joseph N Paulson*, Cho-Yi Chen, Marieke L Kuijjer, Maud Fagny, John Platig, Abhijeet R Sonawane, Dawn L DeMeo, John Quackenbush, and Kimberly Glass. 9/12/2017. “Regulatory network changes between cell lines and their tissues of origin.” BMC Genomics, 18, Pp. 723. Publisher's VersionAbstract

Cell lines are an indispensable tool in biomedical research and often used as surrogates for tissues. Although there are recognized important cellular and transcriptomic differences between cell lines and tissues, a systematic overview of the differences between the regulatory processes of a cell line and those of its tissue of origin has not been conducted. The RNA-Seq data generated by the GTEx project is the first available data resource in which it is possible to perform a large-scale transcriptional and regulatory network analysis comparing cell lines with their tissues of origin.

We compared 127 paired Epstein-Barr virus transformed lymphoblastoid cell lines (LCLs) and whole blood samples, and 244 paired primary fibroblast cell lines and skin samples. While gene expression analysis confirms that these cell lines carry the expression signatures of their primary tissues, albeit at reduced levels, network analysis indicates that expression changes are the cumulative result of many previously unreported alterations in transcription factor (TF) regulation. More specifically, cell cycle genes are over-expressed in cell lines compared to primary tissues, and this alteration in expression is a result of less repressive TF targeting. We confirmed these regulatory changes for four TFs, including SMAD5, using independent ChIP-seq data from ENCODE.

Our results provide novel insights into the regulatory mechanisms controlling the expression differences between cell lines and tissues. The strong changes in TF regulation that we observe suggest that network changes, in addition to transcriptional levels, should be considered when using cell lines as models for tissues.

Maud Fagny, Joseph N Paulson, Marieke L Kuijjer, Abhijeet R Sonawane, Cho-Yi Chen, Camila M Lopes-Ramos, Kimberly Glass, John Quackenbush, and John Platig. 2017. “Exploring regulation in tissues with eQTL networks.” Proc Natl Acad Sci U S A, 114, 37, Pp. E7841–E7850. Publisher's VersionAbstract
Characterizing the collective regulatory impact of genetic variants on complex phenotypes is a major challenge in developing a genotype to phenotype map. Using expression quantitative trait locus (eQTL) analyses, we constructed bipartite networks in which edges represent significant associations between genetic variants and gene expression levels and found that the network structure informs regulatory function. We show, in 13 tissues, that these eQTL networks are organized into dense, highly modular communities grouping genes often involved in coherent biological processes. We find communities representing shared processes across tissues, as well as communities associated with tissue-specific processes that coalesce around variants in tissue-specific active chromatin regions. Node centrality is also highly informative, with the global and community hubs differing in regulatory potential and likelihood of being disease associated.
Cho-Yi Chen, Ryan W Logan, Tianzhou Ma, David A Lewis, George C Tseng, Etienne Sibille, and Colleen A McClung. 2016. “Effects of aging on circadian patterns of gene expression in the human prefrontal cortex.” Proc Natl Acad Sci U S A, 113, 1, Pp. 206-11.Abstract

With aging, significant changes in circadian rhythms occur, including a shift in phase toward a "morning" chronotype and a loss of rhythmicity in circulating hormones. However, the effects of aging on molecular rhythms in the human brain have remained elusive. Here, we used a previously described time-of-death analysis to identify transcripts throughout the genome that have a significant circadian rhythm in expression in the human prefrontal cortex [Brodmann's area 11 (BA11) and BA47]. Expression levels were determined by microarray analysis in 146 individuals. Rhythmicity in expression was found in ∼ 10% of detected transcripts (P < 0.05). Using a metaanalysis across the two brain areas, we identified a core set of 235 genes (q < 0.05) with significant circadian rhythms of expression. These 235 genes showed 92% concordance in the phase of expression between the two areas. In addition to the canonical core circadian genes, a number of other genes were found to exhibit rhythmic expression in the brain. Notably, we identified more than 1,000 genes (1,186 in BA11; 1,591 in BA47) that exhibited age-dependent rhythmicity or alterations in rhythmicity patterns with aging. Interestingly, a set of transcripts gained rhythmicity in older individuals, which may represent a compensatory mechanism due to a loss of canonical clock function. Thus, we confirm that rhythmic gene expression can be reliably measured in human brain and identified for the first time (to our knowledge) significant changes in molecular rhythms with aging that may contribute to altered cognition, sleep, and mood in later life.

High Attention Paper (99th percentile) | NEWS | DATABASE

As Aging Brain's Internal Clock Fades, A New Timekeeper May Kick In.

—National Public Radio (NPR)

Seeking the Gears of Our Inner Clock.

The New York Times

People's Brain Chemistry May Reveal the Hour of Their Death.

Smithsonian magazine

Gene's Cycles Change with Age.

The Scientist

Backup internal clock may protect against Alzheimer's and other age-related diseases.

—CBC News

Cho-Yi Chen*, Li Zhu*, Ying Ding*, Lin WANG, Zhiguang Huo, SungHwan Kim, Christos Sotiriou, Steffi Oesterreich, and George C Tseng. 2016. “MetaDCN: meta-analysis framework for differential co-expression network detection with an application in breast cancer.” Bioinformatics, Published Advanced Online. DOIAbstract

Motivation: Gene co-expression network analysis from transcriptomic studies can elucidate genegene interactions and regulatory mechanisms. Differential co-expression analysis helps further detect alterations of regulatory activities in case/control comparison. Co-expression networks estimated from single transcriptomic study is often unstable and not generalizable due to cohort bias and limited sample size. With the rapid accumulation of publicly available transcriptomic studies, co-expression analysis combining multiple transcriptomic studies can provide more accurate and robust results.

Results: In this paper, we propose a meta-analytic framework for detecting differentially co-expressed networks (MetaDCN). Differentially co-expressed seed modules are first detected by optimizing an energy function via simulated annealing. Basic modules sharing common pathways are merged into pathwaycentric supermodules and a Cytoscape plug-in (MetaDCNExplorer) is developed to visualize and explore the findings. We applied MetaDCN to two breast cancer applications: ER+/ER- comparison using five training and three testing studies, and ILC/IDC comparison with two training and two testing studies. We identified 20 and 4 supermodules for ER+/ER- and ILC/IDC comparisons, respectively. Ranking atop are “immune response pathway” and “complement cascades pathway” for ER comparison, and “ extracellular matrix pathway” for ILC/IDC comparison. Without the need for prior information, the results from MetaDCN confirm existing as well as discover novel disease mechanisms in a systems manner.

Cho-Yi Chen, Ryan Logan, Tianzhou Ma, David Lewis, George Tseng, Etienne Sibille, and Colleen McClung. 2015. “The Effects of Aging and Psychiatric Disease on Circadian Patterns of Gene Expression in the Human Prefrontal Cortex.” In Neuropsychopharmacology, 40: Pp. S199–S200. Hollywood, Florida, USA: Nature Publishing Group.Abstract

Background: With aging, significant changes in circadian rhythms occur, including a shift in phase toward a ‘‘morning’’ chronotype and a loss of rhythmicity in circulating hormones. There are also well documented disruptions to circadian rhythms that are associated with several psychiatric disorders. However, the effects of aging and psychiatric conditions on molecular rhythms in the human brain have remained elusive.
Methods: Here we employed a previously described time-ofdeath analyses to identify transcripts throughout the genome that have a significant circadian rhythm in expression in the human prefrontal cortex (Brodmann’s areas (BA) 11 and 47). Expression levels were determined by microarray analysis in 146 individuals.
Results: Rhythmicity in expression was found in B10% of detected transcripts (p<0.05). Using a meta-analysis across the two brain areas, we identified a core set of 235 genes (q<0.05) with significant circadian rhythms of expression. These 235 genes showed 92% concordance in the phase of expression between the two areas. In addition to the canonical core circadian genes, a number of other genes were found to exhibit rhythmic expression in the brain. Notably, we identified more than one thousand genes (1,186 in BA11; 1,591 in BA47) that exhibited age-dependent rhythmicity or alterations in rhythmicity patterns with aging. Interestingly, a set of transcripts gained rhythmicity in older individuals, which may represent a compensatory mechanism due to a loss of canonical clock function. We are currently analyzing samples from subjects with either bipolar disorder or schizophrenia and these data will also be presented.
Conclusions: We confirm that rhythmic gene expression can be reliably measured in human brain and identified for the first time significant changes in molecular rhythms with aging that may contribute to altered cognition, sleep and mood in later life.

Cho-Yi Chen, Andy Ho, Hsin-Yuan Huang, Hsueh-Fen Juan, and Hsuan-Cheng Huang. 2014. “Dissecting the human protein-protein interaction network via phylogenetic decomposition.” Sci Rep, 4, Pp. 7153.Abstract

The protein-protein interaction (PPI) network offers a conceptual framework for better understanding the functional organization of the proteome. However, the intricacy of network complexity complicates comprehensive analysis. Here, we adopted a phylogenic grouping method combined with force-directed graph simulation to decompose the human PPI network in a multi-dimensional manner. This network model enabled us to associate the network topological properties with evolutionary and biological implications. First, we found that ancient proteins occupy the core of the network, whereas young proteins tend to reside on the periphery. Second, the presence of age homophily suggests a possible selection pressure may have acted on the duplication and divergence process during the PPI network evolution. Lastly, functional analysis revealed that each age group possesses high specificity of enriched biological processes and pathway engagements, which could correspond to their evolutionary roles in eukaryotic cells. More interestingly, the network landscape closely coincides with the subcellular localization of proteins. Together, these findings suggest the potential of using conceptual frameworks to mimic the true functional organization in a living cell.

Chen et al. (2014) adapted a phylogenic grouping method in conjunction with force-directed graph simulation to dissect the human interactome that enabled them to associate network properties with evolutionary and biological properties.

— Karagoz et al, Journal of Theoretical Biology (2016)

In humans, the human PPIN was decomposed according to the phylogenetic age groups of proteins (Chen et al. 2014) ... One important conclusion from these results is that the older, more central proteins tend to not develop new interactions over time... This avoidance of perturbation of aged proteins implies an overall network structure that may beconserved over time ... Understanding how networks evolve is important in characterizing cellular organization and evidence suggests that organizational schemes may be conserved among species.

— McCormack et al, Curr Plant Biol (2015)

Chen-Ching Lin*, Ya-Jen Chen*, Cho-Yi Chen, Yen-Jen Oyang, Hsueh-Fen Juan, and Hsuan-Cheng Huang. 2012. “Crosstalk between transcription factors and microRNAs in human protein interaction network.” BMC Syst Biol, 6, Pp. 18.Abstract

BACKGROUND: Gene regulatory networks control the global gene expression and the dynamics of protein output in living cells. In multicellular organisms, transcription factors and microRNAs are the major families of gene regulators. Recent studies have suggested that these two kinds of regulators share similar regulatory logics and participate in cooperative activities in the gene regulatory network; however, their combinational regulatory effects and preferences on the protein interaction network remain unclear.

METHODS: In this study, we constructed a global human gene regulatory network comprising both transcriptional and post-transcriptional regulatory relationships, and integrated the protein interactome into this network. We then screened the integrated network for four types of regulatory motifs: single-regulation, co-regulation, crosstalk, and independent, and investigated their topological properties in the protein interaction network.

RESULTS: Among the four types of network motifs, the crosstalk was found to have the most enriched protein-protein interactions in their downstream regulatory targets. The topological properties of these motifs also revealed that they target crucial proteins in the protein interaction network and may serve important roles of biological functions.

CONCLUSIONS: Altogether, these results reveal the combinatorial regulatory patterns of transcription factors and microRNAs on the protein interactome, and provide further evidence to suggest the connection between gene regulatory network and protein interaction network.

BMC Highly Accessed

Cho-Yi Chen, Shui-Tein Chen, Hsueh-Fen Juan, and Hsuan-Cheng Huang. 2012. “Lengthening of 3'UTR increases with morphological complexity in animal evolution.” Bioinformatics, 28, 24, Pp. 3178-81.Abstract

MOTIVATION: Evolutionary expansion of gene regulatory circuits seems to boost morphological complexity. However, the expansion patterns and the quantification relationships have not yet been identified. In this study, we focus on the regulatory circuits at the post-transcriptional level, investigating whether and how this principle may apply.

RESULTS: By analysing the structure of mRNA transcripts in multiple metazoan species, we observed a striking exponential correlation between the length of 3' untranslated regions (3'UTR) and morphological complexity as measured by the number of cell types in each organism. Cellular diversity was similarly associated with the accumulation of microRNA genes and their putative targets. We propose that the lengthening of 3'UTRs together with a commensurate exponential expansion in post-transcriptional regulatory circuits can contribute to the emergence of new cell types during animal evolution.

ESI Highly Cited Paper

Notably, increasing organismal complexity correlates with lengthening of the 3' UTR (Chen et al. 2012), emphasizing the importance of this region.

Turner et al, Nature Immuno (2014)

Other recent findings also support the view that UTRs may play a previously underappreciated role in establishing specific functions of tissues and organs: Throughout the evolution of animals, UTR length has increased along with morphological complexity (Chen et al. 2012).

Reyes et alProc Natl Acad Sci USA (2013)

Chen et al. (2012) ... has reported on the exponential correlation of miRNA gene number and 3'-UTR length—but not 5'-UTR or coding sequence length—with morphological complexity in animals.

Carroll et alJ Mol Cell Biol (2013)


Cho-Yi Chen, Shui-Tein Chen, Chiou-Shann Fuh, Hsueh-Fen Juan, and Hsuan-Cheng Huang. 2011. “Coregulation of transcription factors and microRNAs in human transcriptional regulatory network.” BMC Bioinformatics, 12 Suppl 1, Pp. S41.Abstract

BACKGROUND: MicroRNAs (miRNAs) are small RNA molecules that regulate gene expression at the post-transcriptional level. Recent studies have suggested that miRNAs and transcription factors are primary metazoan gene regulators; however, the crosstalk between them still remains unclear.

METHODS: We proposed a novel model utilizing functional annotation information to identify significant coregulation between transcriptional and post-transcriptional layers. Based on this model, function-enriched coregulation relationships were discovered and combined into different kinds of functional coregulation networks. RESULTS: We found that miRNAs may engage in a wider diversity of biological processes by coordinating with transcription factors, and this kind of cross-layer coregulation may have higher specificity than intra-layer coregulation. In addition, the coregulation networks reveal several types of network motifs, including feed-forward loops and massive upstream crosstalk. Finally, the expression patterns of these coregulation pairs in normal and tumour tissues were analyzed. Different coregulation types show unique expression correlation trends. More importantly, the disruption of coregulation may be associated with cancers.

CONCLUSION: Our findings elucidate the combinatorial and cooperative properties of transcription factors and miRNAs regulation, and we proposes that the coordinated regulation may play an important role in many biological processes.

ESI Highly Cited Paper

Chen el al. (2011) used gene functional enrichment analysis (second filter in Figure 3) to explore the co-regulatory relationships.... It was found that some biological processes emerged only in co-regulation and that the disruption of co-regulation might be closely related to cancers, suggesting the importance of the co-regulation of miRNAs and TFs.

Le et alBrief Bioinform (2014)

One of the most notable features is that the same transcription factor can activate or repress gene expression and even change binding specificities according to its dynamic interactions with other transcription factors and coactivators (Chen et al. 2011).

Gennarino et al, Genome Res (2012)

Our results also indicate that miR-145 interferes with transcription factors responsible for the initiation of melanogenesis. Interestingly, it has been proposed that co-regulation of miRNAs and transcription factors is of particular importance in pigment pathways (Chen et al. 2011).

Dynoodt et al, J Invest Dermatol (2012)