Qi Lv, Yiheng Lan, Yan Shi, Huan Wang, Xia Pan, Peng Li, and Tieliu Shi. 2017. “AtPID: a genome-scale resource for genotype-phenotype associations in Arabidopsis.” Nucleic Acids Res, 45, D1, Pp. D1060-D1063.Abstract
AtPID (Arabidopsis thaliana Protein Interactome Database, available at is an integrated database resource for protein interaction network and functional annotation. In the past few years, we collected 5564 mutants with significant morphological alterations and manually curated them to 167 plant ontology (PO) morphology categories. These single/multiple-gene mutants were indexed and linked to 3919 genes. After integrated these genotype-phenotype associations with the comprehensive protein interaction network in AtPID, we developed a Naïve Bayes method and predicted 4457 novel high confidence gene-PO pairs with 1369 genes as the complement. Along with the accumulated novel data for protein interaction and functional annotation, and the updated visualization toolkits, we present a genome-scale resource for genotype-phenotype associations for Arabidopsis in AtPID 5.0. In our updated website, all the new genotype-phenotype associations from mutants, protein network, and the protein annotation information can be vividly displayed in a comprehensive network view, which will greatly enhance plant protein function and genotype-phenotype association studies in a systematical way.
Jinwen Feng, Chen Ding, Naiqi Qiu, Xiaotian Ni, Dongdong Zhan, Wanlin Liu, Xia Xia, Peng Li, Bingxin Lu, Qi Zhao, Peng Nie, Lei Song, Quan Zhou, Mi Lai, Gaigai Guo, Weimin Zhu, Jian Ren, Tieliu Shi, and Jun Qin. 2017. “Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis.” Nat Biotechnol, 35, 5, Pp. 409-412.
Peng Li, Ronald G Tompkins, and Wenzhong Xiao. 2017. “KERIS: kaleidoscope of gene responses to inflammation between species.” Nucleic Acids Res, 45, D1, Pp. D908-D914.Abstract
A cornerstone of modern biomedical research is the use of animal models to study disease mechanisms and to develop new therapeutic approaches. In order to help the research community to better explore the similarities and differences of genomic response between human inflammatory diseases and murine models, we developed KERIS: kaleidoscope of gene responses to inflammation between species (available at As of June 2016, KERIS includes comparisons of the genomic response of six human inflammatory diseases (burns, trauma, infection, sepsis, endotoxin and acute respiratory distress syndrome) and matched mouse models, using 2257 curated samples from the Inflammation and the Host Response to Injury Glue Grant studies and other representative studies in Gene Expression Omnibus. A researcher can browse, query, visualize and compare the response patterns of genes, pathways and functional modules across different diseases and corresponding murine models. The database is expected to help biologists choosing models when studying the mechanisms of particular genes and pathways in a disease and prioritizing the translation of findings from disease models into clinical studies.
Junfeng Xu, Lihui Li, Guangyang Yu, Wantao Ying, Qiang Gao, Wenjuan Zhang, Xianyu Li, Chen Ding, Yanan Jiang, Dongping Wei, Shengzhong Duan, Qunying Lei, Peng Li, Tieliu Shi, Xiaohong Qian, Jun Qin, and Lijun Jia. 2015. “The neddylation-cullin 2-RBX1 E3 ligase axis targets tumor suppressor RhoB for degradation in liver cancer.” Mol Cell Proteomics, 14, 3, Pp. 499-509.Abstract
The neddylation-cullin-RING E3 ligase (CRL) pathway has recently been identified as a potential oncogenic event and attractive anticancer target; however, its underlying mechanisms have not been well elucidated. In this study, RhoB, a well known tumor suppressor, was identified and validated with an iTRAQ-based quantitative proteomic approach as a new target of this pathway in liver cancer cells. Specifically, cullin 2-RBX1 E3 ligase, which requires NEDD8 conjugation for its activation, interacted with RhoB and promoted its ubiquitination and degradation. In human liver cancer tissues, the neddylation-CRL pathway was overactivated and reversely correlated with RhoB levels. Moreover, RhoB accumulation upon inhibition of the neddylation-CRL pathway for anticancer therapy contributed to the induction of tumor suppressors p21 and p27, apoptosis, and growth suppression. Our findings highlight the degradation of RhoB via the neddylation-CRL pathway as an important molecular event that drives liver carcinogenesis and RhoB itself as a pivotal effector for anticancer therapy targeting this oncogenic pathway.
Peng Li, Yongrui Liu, Huan Wang, Yuan He, Xue Wang, Yundong He, Fang Lv, Huaqing Chen, Xiufeng Pang, Mingyao Liu, Tieliu Shi, and Zhengfang Yi. 2015. “PubAngioGen: a database and knowledge for angiogenesis and related diseases.” Nucleic Acids Res, 43, Database issue, Pp. D963-7.Abstract
Angiogenesis is the process of generating new blood vessels based on existing ones, which is involved in many diseases including cancers, cardiovascular diseases and diabetes mellitus. Recently, great efforts have been made to explore the mechanisms of angiogenesis in various diseases and many angiogenic factors have been discovered as therapeutic targets in anti- or pro-angiogenic drug development. However, the resulted information is sparsely distributed and no systematical summarization has been made. In order to integrate these related results and facilitate the researches for the community, we conducted manual text-mining from published literature and built a database named as PubAngioGen ( Our online application displays a comprehensive network for exploring the connection between angiogenesis and diseases at multilevels including protein-protein interaction, drug-target, disease-gene and signaling pathways among various cells and animal models recorded through text-mining. To enlarge the scope of the PubAngioGen application, our database also links to other common resources including STRING, DrugBank and OMIM databases, which will facilitate understanding the underlying molecular mechanisms of angiogenesis and drug development in clinical therapy.
2014. “A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.” Nat Biotechnol, 32, 9, Pp. 903-14.Abstract
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
Wenjuan Wu, Ying Zhu, Zhaoxue Ma, Yi Sun, Quan Quan, Peng Li, Pengzhan Hu, Tieliu Shi, Clive Lo, Ivan K Chu, and Jirong Huang. 2013. “Proteomic evidence for genetic epistasis: ClpR4 mutations switch leaf variegation to virescence in Arabidopsis.” Plant J, 76, 6, Pp. 943-56.Abstract
Chloroplast development in plants is regulated by a series of coordinated biological processes. In this work, a genetic suppressor screen for the leaf variegation phenotype of the thylakoid formation 1 (thf1) mutant combined with a proteomic assay was employed to elucidate this complicated network. We identified a mutation in ClpR4, named clpR4-3, which leads to leaf virescence and also rescues the var2 variegation. Proteomic analysis showed that the chloroplast proteome of clpR4-3 thf1 is dominantly controlled by clpR4-3, providing molecular mechanisms that cause genetic epistasis of clpR4-3 to thf1. Classification of the proteins significantly mis-regulated in the mutants revealed that those functioning in the expression of plastid genes are oppositely regulated while proteins functioning in antioxidative stress, protein folding, and starch metabolism are changed in the same direction between thf1 and clpR4-3. The levels of FtsHs including FtsH2/VAR2, FtsH8, and FtsH5/VAR1 are greatly reduced in thf1 compared with those in the wild type, but are higher in clpR4-3 thf1 than in thf1. Quantitative PCR analysis revealed that FtsH expression in clpR4-3 thf1 is regulated post-transcriptionally. In addition, a number of ribosomal proteins are less expressed in the clpR4-3 proteome, which is in line with the reduced levels of rRNAs in clpR4-3. Furthermore, knocking out PRPL11, one of the most downregulated proteins in the clpR4-3 thf1 proteome, rescues the leaf variegation phenotype of the thf1 and var2 mutants. These results provide insights into molecular mechanisms by which the virescent clpR4-3 mutation suppresses leaf variegation of thf1 and var2.
Peng Li, Weidong Zang, Yuhua Li, Feng Xu, Jigang Wang, and Tieliu Shi. 2011. “AtPID: the overall hierarchical functional protein interaction network interface and analytic platform for Arabidopsis.” Nucleic Acids Res, 39, Database issue, Pp. D1130-3.Abstract
Protein interactions are involved in important cellular functions and biological processes that are the fundamentals of all life activities. With improvements in experimental techniques and progress in research, the overall protein interaction network frameworks of several model organisms have been created through data collection and integration. However, most of the networks processed only show simple relationships without boundary, weight or direction, which do not truly reflect the biological reality. In vivo, different types of protein interactions, such as the assembly of protein complexes or phosphorylation, often have their specific functions and qualifications. Ignorance of these features will bring much bias to the network analysis and application. Therefore, we annotate the Arabidopsis proteins in the AtPID database with further information (e.g. functional annotation, subcellular localization, tissue-specific expression, phosphorylation information, SNP phenotype and mutant phenotype, etc.) and interaction qualifications (e.g. transcriptional regulation, complex assembly, functional collaboration, etc.) via further literature text mining and integration of other resources. Meanwhile, the related information is vividly displayed to users through a comprehensive and newly developed display and analytical tools. The system allows the construction of tissue-specific interaction networks with display of canonical pathways. The latest updated AtPID database is available at
Geng Chen, Kangping Yin, Leming Shi, Yuanzhang Fang, Ya Qi, Peng Li, Jian Luo, Bing He, Mingyao Liu, and Tieliu Shi. 2011. “Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq.” PLoS One, 6, 11, Pp. e28318.Abstract
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Feng Xu, Guang Li, Chen Zhao, Yuhua Li, Peng Li, Jian Cui, Youping Deng, and Tieliu Shi. 2010. “Global protein interactome exploration through mining genome-scale data in Arabidopsis thaliana.” BMC Genomics, 11 Suppl 2, Pp. S2.Abstract
BACKGROUND: Many essential cellular processes, such as cellular metabolism, transport, cellular metabolism and most regulatory mechanisms, rely on physical interactions between proteins. Genome-wide protein interactome networks of yeast, human and several other animal organisms have already been established, but this kind of network reminds to be established in the field of plant. RESULTS: We first predicted the protein protein interaction in Arabidopsis thaliana with methods, including ortholog, SSBP, gene fusion, gene neighbor, phylogenetic profile, coexpression, protein domain, and used Naïve Bayesian approach next to integrate the results of these methods and text mining data to build a genome-wide protein interactome network. Furthermore, we adopted the data of GO enrichment analysis, pathway, published literature to validate our network, the confirmation of our network shows the feasibility of using our network to predict protein function and other usage. CONCLUSIONS: Our interactome is a comprehensive genome-wide network in the organism plant Arabidopsis thaliana, and provides a rich resource for researchers in related field to study the protein function, molecular interaction and potential mechanism under different conditions.
Bing He, Xiaojie Qiu, Peng Li, Lishan Wang, Qi Lv, and Tieliu Shi. 2010. “HCCNet: an integrated network database of hepatocellular carcinoma.” Cell Res, 20, 6, Pp. 732-4.
Junwei Wang, Meiwen Jia, Liping Zhu, Zengjin Yuan, Peng Li, Chang Chang, Jian Luo, Mingyao Liu, and Tieliu Shi. 2010. “Systematical detection of significant genes in microarray data by incorporating gene interaction relationship in biological systems.” PLoS One, 5, 10, Pp. e13721.Abstract
Many methods, including parametric, nonparametric, and Bayesian methods, have been used for detecting differentially expressed genes based on the assumption that biological systems are linear, which ignores the nonlinear characteristics of most biological systems. More importantly, those methods do not simultaneously consider means, variances, and high moments, resulting in relatively high false positive rate. To overcome the limitations, the SWang test is proposed to determine differentially expressed genes according to the equality of distributions between case and control. Our method not only latently incorporates functional relationships among genes to consider nonlinear biological system but also considers the mean, variance, skewness, and kurtosis of expression profiles simultaneously. To illustrate biological significance of high moments, we construct a nonlinear gene interaction model, demonstrating that skewness and kurtosis could contain useful information of function association among genes in microarrays. Simulations and real microarray results show that false positive rate of SWang is lower than currently popular methods (T-test, F-test, SAM, and Fold-change) with much higher statistical power. Additionally, SWang can uniquely detect significant genes in real microarray data with imperceptible differential expression but higher variety in kurtosis and skewness. Those identified genes were confirmed with previous published literature or RT-PCR experiments performed in our lab.
Jian Cui, Peng Li, Guang Li, Feng Xu, Chen Zhao, Yuhua Li, Zhongnan Yang, Guang Wang, Qingbo Yu, Yixue Li, and Tieliu Shi. 2008. “AtPID: Arabidopsis thaliana protein interactome database--an integrative platform for plant systems biology.” Nucleic Acids Res, 36, Database issue, Pp. D999-1008.Abstract
Arabidopsis thaliana Protein Interactome Database (AtPID) is an object database that integrates data from several bioinformatics prediction methods and manually collected information from the literature. It contains data relevant to protein-protein interaction, protein subcellular location, ortholog maps, domain attributes and gene regulation. The predicted protein interaction data were obtained from ortholog interactome, microarray profiles, GO annotation, and conserved domain and genome contexts. This database holds 28,062 protein-protein interaction pairs with 23,396 pairs generated from prediction methods. Among the rest 4666 pairs, 3866 pairs of them involving 1875 proteins were manually curated from the literature and 800 pairs were from enzyme complexes in KEGG. In addition, subcellular location information of 5562 proteins is available. AtPID was built via an intuitive query interface that provides easy access to the important features of proteins. Through the incorporation of both experimental and computational methods, AtPID is a rich source of information for system-level understanding of gene function and biological processes in A. thaliana. Public access to the AtPID database is available at