Steve Rodriguez, Clemens Hug, Petar Todorov, Nienke Moret, Sarah A Boswell, Kyle Evans, George Zhou, Nathan T Johnson, Bradley T Hyman, Peter K Sorger, Mark W Albers, and Artem Sokolov. 2021. “Machine learning identifies candidates for drug repurposing in Alzheimer's disease.” Nat Commun, 12, 1, Pp. 1033.Abstract
Clinical trials of novel therapeutics for Alzheimer's Disease (AD) have consumed a large amount of time and resources with largely negative results. Repurposing drugs already approved by the Food and Drug Administration (FDA) for another indication is a more rapid and less expensive option. We present DRIAD (Drug Repurposing In AD), a machine learning framework that quantifies potential associations between the pathology of AD severity (the Braak stage) and molecular mechanisms as encoded in lists of gene names. DRIAD is applied to lists of genes arising from perturbations in differentiated human neural cell cultures by 80 FDA-approved and clinically tested drugs, producing a ranked list of possible repurposing candidates. Top-scoring drugs are inspected for common trends among their targets. We propose that the DRIAD method can be used to nominate drugs that, after additional validation and identification of relevant pharmacodynamic biomarker(s), could be readily evaluated in a clinical trial.
Thais Sabedot, Tathiane Malta, James Snyder, Kevin Nelson, Michael Wells, Ana deCarvalho, Abir Mukherjee, Dhan Chitale, Maritza Mosella, Artem Sokolov, Karam Asmaro, Adam Robin, Michael Rosenblum, Tom Mikkelsen, Jack Rock, Laila Poisson, Ian Lee, Tobias Walbert, Steven Kalkanis, Antonio Iavarone, Ana Valeria Castro, and Houtan Noushmehr. 2021. “A serum-based DNA methylation assay provides accurate detection of glioma.” Neuro Oncol.Abstract
BACKGROUND: The detection of somatic mutations in cell-free DNA (cfDNA) from liquid biopsy has emerged as a non-invasive tool to monitor the follow-up of cancer patients. However, the significance of cfDNA clinical utility remains uncertain in patients with brain tumors, primarily because of the limited sensitivity cfDNA has to detect real tumor-specific somatic mutations. This unresolved challenge has prevented accurate follow-up of glioma patients with non-invasive approaches. METHODS: Genome-wide DNA methylation profiling of tumor tissue and serum cell-free DNA of glioma patients. RESULTS: Here, we developed a non-invasive approach to profile the DNA methylation status in the serum of patients with gliomas and identified a cfDNA-derived methylation signature that is associated with the presence of gliomas and related immune features. By testing the signature in an independent discovery and validation cohorts, we developed and verified a score metric (the "glioma epigenetic liquid biopsy score" or GeLB) that optimally distinguished patients with or without glioma (sensitivity: 100%, specificity: 97.78%). Furthermore, we found that changes in GeLB score reflected clinicopathological changes during surveillance (e.g., progression, pseudoprogression or response to standard or experimental treatment). CONCLUSIONS: Our results suggest that the GeLB score can be used as a complementary approach to diagnose and follow up patients with glioma.
Mi Yang, Francesca Petralia, Zhi Li, Hongyang Li, Weiping Ma, Xiaoyu Song, Sunkyu Kim, Heewon Lee, Han Yu, Bora Lee, Seohui Bae, Eunji Heo, Jan Kaczmarczyk, Piotr Stępniak, Michał Warchoł, Thomas Yu, Anna P Calinawan, Paul C Boutros, Samuel H Payne, Boris Reva, Emily Boja, Henry Rodriguez, Gustavo Stolovitzky, Yuanfang Guan, Jaewoo Kang, Pei Wang, David Fenyö, Julio Saez-Rodriguez, and NCI-CPTACDREAM Consortium. 2020. “Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics.” Cell Syst, 11, 2, Pp. 186-195.e9.Abstract
Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information.
Orit Rozenblatt-Rosen, Aviv Regev, Philipp Oberdoerffer, Tal Nawy, Anna Hupalowska, Jennifer E Rood, Orr Ashenberg, Ethan Cerami, Robert J Coffey, Emek Demir, Li Ding, Edward D Esplin, James M Ford, Jeremy Goecks, Sharmistha Ghosh, Joe W Gray, Justin Guinney, Sean E Hanlon, Shannon K Hughes, Shelley E Hwang, Christine A Iacobuzio-Donahue, Judit Jané-Valbuena, Bruce E Johnson, Ken S Lau, Tracy Lively, Sarah A Mazzilli, Dana Pe'er, Sandro Santagata, Alex K Shalek, Denis Schapiro, Michael P Snyder, Peter K Sorger, Avrum E Spira, Sudhir Srivastava, Kai Tan, Robert B West, Elizabeth H Williams, and Atlas Network Human Tumor. 2020. “The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution.” Cell, 181, 2, Pp. 236-249.Abstract
Crucial transitions in cancer-including tumor initiation, local expansion, metastasis, and therapeutic resistance-involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer.
Artem Sokolov, Stephanie Ashenden, Nil Sahin, Richard Lewis, Nurdan Erdem, Elif Ozaltan, Andreas Bender, Frederick P Roth, and Murat Cokol. 2019. “Characterizing ABC-Transporter Substrate-Likeness Using a Clean-Slate Genetic Background.” Front Pharmacol, 10, Pp. 448.Abstract
Mutations in ATP Binding Cassette (ABC)-transporter genes can have major effects on the bioavailability and toxicity of the drugs that are ABC-transporter substrates. Consequently, methods to predict if a drug is an ABC-transporter substrate are useful for drug development. Such methods traditionally relied on literature curated collections of ABC-transporter dependent membrane transfer assays. Here, we used a single large-scale dataset of 376 drugs with relative efficacy on an engineered yeast strain with all ABC-transporter genes deleted (ABC-16), to explore the relationship between a drug's chemical structure and ABC-transporter substrate-likeness. We represented a drug's chemical structure by an array of substructure keys and explored several machine learning methods to predict the drug's efficacy in an ABC-16 yeast strain. Gradient-Boosted Random Forest models outperformed all other methods with an AUC of 0.723. We prospectively validated the model using new experimental data and found significant agreement with predictions. Our analysis expands the previously reported chemical substructures associated with ABC-transporter substrates and provides an alternative means to investigate ABC-transporter substrate-likeness.
Robert Krueger, Johanna Beyer, Won-Dong Jang, Nam Wook Kim, Artem Sokolov, Peter K Sorger, and Hanspeter Pfister. 2019. “Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data.” IEEE Trans Vis Comput Graph.Abstract
Facetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109 or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.
Rumana Rashid, Giorgio Gaglia, Yu-An Chen, Jia-Ren Lin, Ziming Du, Zoltan Maliga, Denis Schapiro, Clarence Yapp, Jeremy Muhlich, Artem Sokolov, Peter Sorger, and Sandro Santagata. 2019. “Highly multiplexed immunofluorescence images and single-cell data of immune markers in tonsil and lung cancer.” Sci Data, 6, 1, Pp. 323.Abstract
In this data descriptor, we document a dataset of multiplexed immunofluorescence images and derived single-cell measurements of immune lineage and other markers in formaldehyde-fixed and paraffin-embedded (FFPE) human tonsil and lung cancer tissue. We used tissue cyclic immunofluorescence (t-CyCIF) to generate fluorescence images which we artifact corrected using the BaSiC tool, stitched and registered using the ASHLAR algorithm, and segmented using ilastik software and MATLAB. We extracted single-cell features from these images using HistoCAT software. The resulting dataset can be visualized using image browsers and analyzed using high-dimensional, single-cell methods. This dataset is a valuable resource for biological discovery of the immune system in normal and diseased states as well as for the development of multiplexed image analysis and viewing tools.
Camila Ferreira de Souza, Thais S Sabedot, Tathiane M Malta, Lindsay Stetson, Olena Morozova, Artem Sokolov, Peter W Laird, Maciej Wiznerowicz, Antonio Iavarone, James Snyder, Ana deCarvalho, Zachary Sanborn, Kerrie L McDonald, William A Friedman, Daniela Tirapelli, Laila Poisson, Tom Mikkelsen, Carlos G Carlotti, Steven Kalkanis, Jean Zenklusen, Sofie R Salama, Jill S Barnholtz-Sloan, and Houtan Noushmehr. 2018. “A Distinct DNA Methylation Shift in a Subset of Glioma CpG Island Methylator Phenotypes during Tumor Recurrence.” Cell Rep, 23, 2, Pp. 637-651.Abstract
Glioma diagnosis is based on histomorphology and grading; however, such classification does not have predictive clinical outcome after glioblastomas have developed. To date, no bona fide biomarkers that significantly translate into a survival benefit to glioblastoma patients have been identified. We previously reported that the IDH mutant G-CIMP-high subtype would be a predecessor to the G-CIMP-low subtype. Here, we performed a comprehensive DNA methylation longitudinal analysis of diffuse gliomas from 77 patients (200 tumors) to enlighten the epigenome-based malignant transformation of initially lower-grade gliomas. Intra-subtype heterogeneity among G-CIMP-high primary tumors allowed us to identify predictive biomarkers for assessing the risk of malignant recurrence at early stages of disease. G-CIMP-low recurrence appeared in 9.5% of all gliomas, and these resembled IDH-wild-type primary glioblastoma. G-CIMP-low recurrence can be characterized by distinct epigenetic changes at candidate functional tissue enhancers with AP-1/SOX binding elements, mesenchymal stem cell-like epigenomic phenotype, and genomic instability. Molecular abnormalities of longitudinal G-CIMP offer possibilities to defy glioblastoma progression.
Tathiane M Malta, Artem Sokolov, Andrew J Gentles, Tomasz Burzykowski, Laila Poisson, John N Weinstein, Bożena Kamińska, Joerg Huelsken, Larsson Omberg, Olivier Gevaert, Antonio Colaprico, Patrycja Czerwińska, Sylwia Mazurek, Lopa Mishra, Holger Heyn, Alex Krasnitz, Andrew K Godwin, Alexander J Lazar, Joshua M Stuart, Katherine A Hoadley, Peter W Laird, Houtan Noushmehr, and Maciej Wiznerowicz. 2018. “Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation.” Cell, 173, 2, Pp. 338-354.e15.Abstract
Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.
Rahul Aggarwal, Jiaoti Huang, Joshi J Alumkal, Li Zhang, Felix Y Feng, George V Thomas, Alana S Weinstein, Verena Friedl, Can Zhang, Owen N Witte, Paul Lloyd, Martin Gleave, Christopher P Evans, Jack Youngren, Tomasz M Beer, Matthew Rettig, Christopher K Wong, Lawrence True, Adam Foye, Denise Playdle, Charles J Ryan, Primo Lara, Kim N Chi, Vlado Uzunangelov, Artem Sokolov, Yulia Newton, Himisha Beltran, Francesca Demichelis, Mark A Rubin, Joshua M Stuart, and Eric J Small. 2018. “Clinical and Genomic Characterization of Treatment-Emergent Small-Cell Neuroendocrine Prostate Cancer: A Multi-institutional Prospective Study.” J Clin Oncol, 9.Abstract
Purpose The prevalence and features of treatment-emergent small-cell neuroendocrine prostate cancer (t-SCNC) are not well characterized in the era of modern androgen receptor (AR)-targeting therapy. We sought to characterize the clinical and genomic features of t-SCNC in a multi-institutional prospective study. Methods Patients with progressive, metastatic castration-resistant prostate cancer (mCRPC) underwent metastatic tumor biopsy and were followed for survival. Metastatic biopsy specimens underwent independent, blinded pathology review along with RNA/DNA sequencing. Results A total of 202 consecutive patients were enrolled. One hundred forty-eight (73%) had prior disease progression on abiraterone and/or enzalutamide. The biopsy evaluable rate was 79%. The overall incidence of t-SCNC detection was 17%. AR amplification and protein expression were present in 67% and 75%, respectively, of t-SCNC biopsy specimens. t-SCNC was detected at similar proportions in bone, node, and visceral organ biopsy specimens. Genomic alterations in the DNA repair pathway were nearly mutually exclusive with t-SCNC differentiation ( P = .035). Detection of t-SCNC was associated with shortened overall survival among patients with prior AR-targeting therapy for mCRPC (hazard ratio, 2.02; 95% CI, 1.07 to 3.82). Unsupervised hierarchical clustering of the transcriptome identified a small-cell-like cluster that further enriched for adverse survival outcomes (hazard ratio, 3.00; 95% CI, 1.25 to 7.19). A t-SCNC transcriptional signature was developed and validated in multiple external data sets with > 90% accuracy. Multiple transcriptional regulators of t-SCNC were identified, including the pancreatic neuroendocrine marker PDX1. Conclusion t-SCNC is present in nearly one fifth of patients with mCRPC and is associated with shortened survival. The near-mutual exclusivity with DNA repair alterations suggests t-SCNC may be a distinct subset of mCRPC. Transcriptional profiling facilitates the identification of t-SCNC and novel therapeutic targets.
Mehmet Gönen, Barbara A Weir, Glenn S Cowley, Francisca Vazquez, Yuanfang Guan, Alok Jaiswal, Masayuki Karasuyama, Vladislav Uzunangelov, Tao Wang, Aviad Tsherniak, Sara Howell, Daniel Marbach, Bruce Hoff, Thea C Norman, Antti Airola, Adrian Bivol, Kerstin Bunte, Daniel Carlin, Sahil Chopra, Alden Deran, Kyle Ellrott, Peddinti Gopalacharyulu, Kiley Graim, Samuel Kaski, Suleiman A Khan, Yulia Newton, Sam Ng, Tapio Pahikkala, Evan Paull, Artem Sokolov, Hao Tang, Jing Tang, Krister Wennerberg, Yang Xie, Xiaowei Zhan, Fan Zhu, Tero Aittokallio, Hiroshi Mamitsuka, Joshua M Stuart, Jesse S Boehm, David E Root, Guanghua Xiao, Gustavo Stolovitzky, William C Hahn, and Adam A Margolin. 2017. “A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines.” Cell Syst, 5, 5, Pp. 485-497.e3.Abstract
We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data type; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlated with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison with this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study also demonstrates the value of releasing pre-publication data publicly to engage the community in an open research collaboration.
Daniel E Carlin, Evan O Paull, Kiley Graim, Christopher K Wong, Adrian Bivol, Peter Ryabinin, Kyle Ellrott, Artem Sokolov, and Joshua M Stuart. 2017. “Prophetic Granger Causality to infer gene regulatory networks.” PLoS One, 12, 12, Pp. e0170340.Abstract
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring.
Justin Guinney, Tao Wang, Teemu D Laajala, Kimberly Kanigel Winner, Christopher J Bare, Elias Chaibub Neto, Suleiman A Khan, Gopal Peddinti, Antti Airola, Tapio Pahikkala, Tuomas Mirtti, Thomas Yu, Brian M Bot, Liji Shen, Kald Abdallah, Thea Norman, Stephen Friend, Gustavo Stolovitzky, Howard Soule, Christopher J Sweeney, Charles J Ryan, Howard I Scher, Oliver Sartor, Yang Xie, Tero Aittokallio, Fang Liz Zhou, James C Costello, and Prostate Cancer Challenge Community. 2017. “Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data.” Lancet Oncol, 18, 1, Pp. 132-142.Abstract

BACKGROUND: Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. METHODS: Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. FINDINGS: 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0·791; Bayes factor >5) and surpassed the reference model (iAUC 0·743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3·32, 95% CI 2·39-4·62, p<0·0001; reference model: 2·56, 1·85-3·53, p<0·0001). The new model was validated further on the ENTHUSE M1 cohort with similarly high performance (iAUC 0·768). Meta-analysis across all methods confirmed previously identified predictive clinical variables and revealed aspartate aminotransferase as an important, albeit previously under-reported, prognostic biomarker. INTERPRETATION: Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer. FUNDING: Sanofi US Services, Project Data Sphere.

Steven M Hill, Laura M Heiser, Thomas Cokelaer, Michael Unger, Nicole K Nesser, Daniel E Carlin, Yang Zhang, Artem Sokolov, Evan O Paull, Chris K Wong, Kiley Graim, Adrian Bivol, Haizhou Wang, Fan Zhu, Bahman Afsari, Ludmila V Danilova, Alexander V Favorov, Wai Shing Lee, Dane Taylor, Chenyue W Hu, Byron L Long, David P Noren, Alexander J Bisberg, Gordon B Mills, Joe W Gray, Michael Kellen, Thea Norman, Stephen Friend, Amina A Qutub, Elana J Fertig, Yuanfang Guan, Mingzhou Song, Joshua M Stuart, Paul T Spellman, Heinz Koeppl, Gustavo Stolovitzky, Julio Saez-Rodriguez, and Sach Mukherjee. 2016. “Inferring causal molecular networks: empirical assessment through a community-based effort.” Nat Methods, 13, 4, Pp. 310-8.Abstract
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
John K Lee, John W Phillips, Bryan A Smith, Jung Wook Park, Tanya Stoyanova, Erin F McCaffrey, Robert Baertsch, Artem Sokolov, Justin G Meyerowitz, Colleen Mathis, Donghui Cheng, Joshua M Stuart, Kevan M Shokat, Clay W Gustafson, Jiaoti Huang, and Owen N Witte. 2016. “N-Myc Drives Neuroendocrine Prostate Cancer Initiated from Human Prostate Epithelial Cells.” Cancer Cell, 29, 4, Pp. 536-47.Abstract
MYCN amplification and overexpression are common in neuroendocrine prostate cancer (NEPC). However, the impact of aberrant N-Myc expression in prostate tumorigenesis and the cellular origin of NEPC have not been established. We define N-Myc and activated AKT1 as oncogenic components sufficient to transform human prostate epithelial cells to prostate adenocarcinoma and NEPC with phenotypic and molecular features of aggressive, late-stage human disease. We directly show that prostate adenocarcinoma and NEPC can arise from a common epithelial clone. Further, N-Myc is required for tumor maintenance, and destabilization of N-Myc through Aurora A kinase inhibition reduces tumor burden. Our findings establish N-Myc as a driver of NEPC and a target for therapeutic intervention.
Artem Sokolov, Evan O Paull, and Joshua M Stuart. 2016. “ONE-CLASS DETECTION OF CELL STATES IN TUMOR SUBTYPES.” Pac Symp Biocomput, 21, Pp. 405-16.Abstract
The cellular composition of a tumor greatly influences the growth, spread, immune activity, drug response, and other aspects of the disease. Tumor cells are usually comprised of a heterogeneous mixture of subclones, each of which could contain their own distinct character. The presence of minor subclones poses a serious health risk for patients as any one of them could harbor a fitness advantage with respect to the current treatment regimen, fueling resistance. It is therefore vital to accurately assess the make-up of cell states within a tumor biopsy. Transcriptome-wide assays from RNA sequencing provide key data from which cell state signatures can be detected. However, the challenge is to find them within samples containing mixtures of cell types of unknown proportions. We propose a novel one-class method based on logistic regression and show that its performance is competitive to two established SVM-based methods for this detection task. We demonstrate that one-class models are able to identify specific cell types in heterogeneous cell populations better than their binary predictor counterparts. We derive one-class predictors for the major breast and bladder subtypes and reaffirm the connection between these two tissues. In addition, we use a one-class predictor to quantitatively associate an embryonic stem cell signature with an aggressive breast cancer subtype that reveals shared stemness pathways potentially important for treatment.
Artem Sokolov, Daniel E Carlin, Evan O Paull, Robert Baertsch, and Joshua M Stuart. 2016. “Pathway-Based Genomics Prediction using Generalized Elastic Net.” PLoS Comput Biol, 12, 3, Pp. e1004790.Abstract
We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.
Bryan A Smith, Artem Sokolov, Vladislav Uzunangelov, Robert Baertsch, Yulia Newton, Kiley Graim, Colleen Mathis, Donghui Cheng, Joshua M Stuart, and Owen N Witte. 2015. “A basal stem cell signature identifies aggressive prostate cancer phenotypes.” Proc Natl Acad Sci U S A, 112, 47, Pp. E6544-52.Abstract
Evidence from numerous cancers suggests that increased aggressiveness is accompanied by up-regulation of signaling pathways and acquisition of properties common to stem cells. It is unclear if different subtypes of late-stage cancer vary in stemness properties and whether or not these subtypes are transcriptionally similar to normal tissue stem cells. We report a gene signature specific for human prostate basal cells that is differentially enriched in various phenotypes of late-stage metastatic prostate cancer. We FACS-purified and transcriptionally profiled basal and luminal epithelial populations from the benign and cancerous regions of primary human prostates. High-throughput RNA sequencing showed the basal population to be defined by genes associated with stem cell signaling programs and invasiveness. Application of a 91-gene basal signature to gene expression datasets from patients with organ-confined or hormone-refractory metastatic prostate cancer revealed that metastatic small cell neuroendocrine carcinoma was molecularly more stem-like than either metastatic adenocarcinoma or organ-confined adenocarcinoma. Bioinformatic analysis of the basal cell and two human small cell gene signatures identified a set of E2F target genes common between prostate small cell neuroendocrine carcinoma and primary prostate basal cells. Taken together, our data suggest that aggressive prostate cancer shares a conserved transcriptional program with normal adult prostate basal stem cells.
Yuan Yuan, Eliezer M Van Allen, Larsson Omberg, Nikhil Wagle, Ali Amin-Mansour, Artem Sokolov, Lauren A Byers, Yanxun Xu, Kenneth R Hess, Lixia Diao, Leng Han, Xuelin Huang, Michael S Lawrence, John N Weinstein, Josh M Stuart, Gordon B Mills, Levi A Garraway, Adam A Margolin, Gad Getz, and Han Liang. 2014. “Assessing the clinical utility of cancer genomic and proteomic data across tumor types.” Nat Biotechnol, 32, 7, Pp. 644-52.Abstract
Molecular profiling of tumors promises to advance the clinical management of cancer, but the benefits of integrating molecular data with traditional clinical variables have not been systematically studied. Here we retrospectively predict patient survival using diverse molecular data (somatic copy-number alteration, DNA methylation and mRNA, microRNA and protein expression) from 953 samples of four cancer types from The Cancer Genome Atlas project. We find that incorporating molecular data with clinical variables yields statistically significantly improved predictions (FDR < 0.05) for three cancers but those quantitative gains were limited (2.2-23.9%). Additional analyses revealed little predictive power across tumor types except for one case. In clinically relevant genes, we identified 10,281 somatic alterations across 12 cancer types in 2,928 of 3,277 patients (89.4%), many of which would not be revealed in single-tumor analyses. Our study provides a starting point and resources, including an open-access model evaluation platform, for building reliable prognostic and therapeutic strategies that incorporate molecular data.
Katherine A Hoadley, Christina Yau, Denise M Wolf, Andrew D Cherniack, David Tamborero, Sam Ng, Max DM Leiserson, Beifang Niu, Michael D McLellan, Vladislav Uzunangelov, Jiashan Zhang, Cyriac Kandoth, Rehan Akbani, Hui Shen, Larsson Omberg, Andy Chu, Adam A Margolin, Laura J Van't Veer, Nuria Lopez-Bigas, Peter W Laird, Benjamin J Raphael, Li Ding, Gordon A Robertson, Lauren A Byers, Gordon B Mills, John N Weinstein, Carter Van Waes, Zhong Chen, Eric A Collisson, TCGA Network, Christopher C Benz, Charles M Perou, and Joshua M Stuart. 2014. “Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.” Cell, 158, 4, Pp. 929-44.Abstract
Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies.