Analysis of recently available microarray expression datasets obtained from the immortalized cell lines of the individuals represented in the HapMap project have led to inconclusive comparisons across cohorts with different Ancestral Continent of Origin (ACOO). To address this apparent inconsistency, we applied a novel approach to accentuate population-specific gene expression signatures for the CEU and YRI trios. In this report, we describe how four independent datasets point to the differential expression across ACOO of gene networks implicated in transforming the normal lymphoblast into immortalized lymphoblastoid cells. In particular, Werner Syndrome helicase (WRN) and related genes are differentially expressed between the YRI and CEU cohorts. We further demonstrate that these differences correlate with viral titer and that both the titer and expression differences are associated with ACOO. We use the 14 genes most differentially expressed to construct an ACOO specific "immortalization network" comprised of 40 genes, one of which show significant correlation with genomic variation (eQTL). The extent to which these measured group differences are due to differences in the immortalization procedures used for each group or reflect ACOO-specific biological differences remains to be determined. That the ACOO group differences in gene expression patterns may depend strongly on the process of transforming cells to establish immortalized lines should be considered in such comparisons.
ABSTRACT: BACKGROUND: In recent years, the molecular underpinnings of the long-observed resemblance between neoplastic and immature tissue have begun to emerge. Genome-wide transcriptional profiling has revealed similar gene expression signatures in several tumor types and early developmental stages of their tissue of origin. However, it remains unclear whether such a relationship is a universal feature of malignancy, whether heterogeneities exist in the developmental component of different tumor types and to which degree the resemblance between cancer and development is a tissue-specific phenomenon. RESULTS: We defined a developmental landscape by summarizing the main features of ten developmental time courses and projected gene expression from a variety of human tumor types onto this landscape. This comparison demonstrates a clear imprint of developmental gene expression in a wide range of tumors and with respect to different, even non-cognate developmental backgrounds. Our analysis reveals three classes of cancers with developmentally distinct transcriptional patterns. We characterize the biological processes dominating these classes and validate the class distinction with respect to a new time series of murine embryonic lung development. Finally, we identify a set of genes that are upregulated in most cancers and we show that this signature is active in early development. CONCLUSION: This systematic and quantitative overview of the relationship between the neoplastic and developmental transcriptome spanning dozens of tissues provides a reliable outline of global trends in cancer gene expression, reveals potentially clinically relevant differences in the gene expression of different cancer types and represents a reference framework for interpretation of smaller-scale functional studies.
DESPITE THE GROWING UNDERSTANDING OF PDGF SIGNALING, STUDIES OF PDGF FUNCTION HAVE ENCOUNTERED TWO MAJOR OBSTACLES: the functional redundancy of PDGFRalpha and PDGFRbeta in vitro and their distinct roles in vivo. Here we used wild-type mouse embryonic fibroblasts (MEF), MEF null for either PDGFRalpha, beta, or both to dissect PDGF-PDGFR signaling pathways. These four PDGFR genetically defined cells provided us a platform to study the relative contributions of the pathways triggered by the two PDGF receptors. They were treated with PDGF-BB and analyzed for differential gene expression, in vitro proliferation and differential response to pharmacological effects. No genes were differentially expressed in the double null cells, suggesting minimal receptor-independent signaling. Protean differentiation and proliferation pathways are commonly regulated by PDGFRalpha, PDGFRbeta and PDGFRalpha/beta while each receptor is also responsible for regulating unique signaling pathways. Furthermore, some signaling is solely modulated through heterodimeric PDGFRalpha/beta.
Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas (TCGA) pilot project aims to assess the value of large-scale multi-dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas-the most common type of adult brain cancer-and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol-3-OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
PURPOSE: The aim of the study is to dissect the cytotoxic mechanisms of 1-(4-hydroxy-3-methoxyphenyl)-7-(3,4-dihydroxyphenyl)-4E-en-3-heptanone (compound 1) in SH-SY5Y cells and therefore to provide new insight into neuroblastoma chemotherapy. METHODS: Nine diarylheptanoids were isolated from Alpinia officinarum by chromatography and their cytotoxicity was evaluated by an MTS assay. Flow cytometry, BrdU incorporation assay and fluorescence staining were employed to investigate cytostatic and apoptotic effects induced by the compound 1. In addition, Western blot, qPCR and siRNA techniques were used to elucidate the molecular mechanisms of the cytotoxicity. RESULTS: The study to elucidate the cytotoxic mechanisms of compound 1, the most potent diarylheptanoid showed that cell cycle-related proteins, cyclins, CDKs and CDKIs, as well as two main apoptotic related families, caspase and Bcl 2 were involved in S phase arrest and apoptosis in neuroblastoma cell line SH-SY5Y. Furthermore, following the drug treatment, the protein expression of p53, phospho-p53 (Ser20) as well as the p53 transcriptional activated genes ATF3, puma and Apaf-1 were increased dramatically; MDM2 and Aurora A, the two p53 negative regulators were decreased; the p53 protein stability was enhanced, whereas the p53 mRNA expression level slightly decreased and ATF3 mRNA expression apparently increased. In addition, the knockdown of ATF3 gene by siRNA partially suppressed p53, caspase 3, S phase arrest and apoptosis triggered by compound 1. CONCLUSION: These results suggest that compound 1 induces S phase arrest and apoptosis via up regulation of ATF3 and stabilization of p53 in SH-SY5Y cell line. Therefore, compound 1 might be a promising lead structure for neuroblastoma therapy.
Natural products derived from plants provide a rich source for development of new anticancer drugs. Dulxanthone A was found to be an active cytotoxic component in Garcinia cowa by bioactivity-directed isolation. Studies to elucidate the cytotoxic mechanisms of dulxanthone A showed that dulxanthone A consistently induced S phase arrest and apoptosis in the most sensitive cell line HepG2. Furthermore, p53 was dramatically up-regulated, leading to altered expression of downstream proteins upon dulxanthone A treatment. Cell cycle related proteins, such as cyclin A, cyclin B, cyclin E, cdc-2, p21 and p27 were down-regulated. Some apoptosis correlated proteins were also altered following the drug treatment. Bcl-2 family members PUMA was up-regulated while Bcl-2 and Bax were down-regulated. However, the expression ratio of Bax/Bcl-2 was increased. This resulted in the release of cytochrome C from the mitochondria to the cytosol. Concurrently, Apaf-1 was stimulated with p53 by dulxanthone A. In result, cytochrome C, Apaf-1 and procaspase-9 form an apoptosome, which in turn triggered the activation of caspase-9, caspase-3 and downstream caspase substrates. Lamin A/C and PARP were down-regulated or cleaved, respectively. Moreover, cell cycle arrest and apoptosis in HepG2 cells induced by dulxanthone A were markedly inhibited by siRNA knockdown of p53. In summary, dulxanthone A is an active cytotoxic component of G. cowa. It induces cell cycle arrest at lower concentrations and triggers apoptosis at higher concentrations via up-regulation of p53 through the intrinsic mitochondrial pathway in HepG2 cells. Dulxanthone A is therefore likely a promising preventive and/or therapeutic agent against Hepatoma.
The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.
Informatics for Integrating Biology and the Bedside (i2b2) is one of the sponsored initiatives of the NIH Roadmap National Centers for Biomedical Computing (http://www.bisti.nih.gov/ncbc/). A major goal of i2b2 is to provide clinical investigators broadly with the software tools necessary to collect and manage project-related clinical research data in the genomics age as a cohesive entitya software suite to construct and manage the modern clinical research chart.
Dr. Schleyer asks a number of important questions. These might be summarized as asking why, as a discipline, are we not focusing on improving the acquisition of structured data rather than going through computational acrobatics to extract codified representation from narrative text? Should we not be focusing our efforts to ensure a fully-structured record?
The increasing availability of electronic medical records offers opportunities to better characterize patient populations and create predictive tools to individualize health care. We determined which asthma patients suffer exacerbations using data extracted from electronic medical records of the Partners Healthcare System using Natural Language Processing tools from the Informatics for Integrating Biology to the Bedside center (i2b2). Univariable and multivariable analysis of data for 11,356 patients (1,394 cases, 9,962 controls) found that race, BMI, smoking history, and age at initial observation are predictors of asthma exacerbations. The area under the receiver operating characteristic curve (AUROC) corresponding to prediction of exacerbations in an independent group of 1,436 asthma patients (106 cases, 1,330 controls) is 0.67. Our findings are consistent with previous characterizations of asthma patients in epidemiological studies, and demonstrate that data extracted by natural language processing from electronic medical records is suitable for the characterization of patient populations.
A greater understanding of the regulatory processes contributing to lung development could help ameliorate morbidity and mortality in premature infants and identify individuals at risk for congenital and/or chronic lung diseases. Genomics technologies have provided rich gene expression datasets for the developing lung that enable systems biology approaches for identifying large-scale molecular signatures within this complex phenomenon. Here, we applied unsupervised principal component analysis on two developing lung datasets and identified common dominant transcriptomic signatures. Of particular interest, we identify an overlying biological program we term 'time-to-birth', which describes the distance in age from the day of birth. We identify groups of genes contributing to the time-to-birth molecular signature. Statistically overrepresented are genes involved in oxygen and gas transport activity, as expected for a transition to air breathing, as well as host defense function. Additionally, we identify genes with expression patterns associated with the initiation of alveolar formation. Finally, we present validation of gene expression patterns across the two datasets, and independent validation of select genes by qPCR and immunohistochemistry. These data contribute to our understanding of genetic components contributing to large-scale biological processes and may be useful, particularly in animal models of abnormal lung development, to predict the state of organ development or preparation for birth.
The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
ABSTRACT: BACKGROUND: Genomic sequencing of SNPs is increasingly prevalent, though the amount of familial information these data contain has not been quantified. METHODS: We provide a framework for measuring the risk to siblings of a patient's SNP genotype disclosure, and demonstrate that sibling SNP genotypes can be inferred with substantial accuracy. RESULTS: Extending this inference technique, we determine that a very low number of matches at commonly varying SNPs is sufficient to confirm sib-ship, demonstrating that published sequence data can reliably be used to derive sibling identities. Using HapMap trio data, at SNPs where one child is homozygotic major, with a minor allele frequency
The Informatics for Integrating Biology and the Bedside (i2b2) is one of the sponsored initiatives of the NIH Roadmap National Centers for Biomedical Computing (http://www.bisti.nih.gov/ncbc/). One of the goals of i2b2 is to provide clinical investigators broadly with the software tools necessary to collect and manage project-related clinical research data in the genomics age as a cohesive entity, a software suite to construct and manage the modern clinical research chart. The i2b2 "hive" is a set of software modules called "cells" that have a common messaging protocol that allow them to interact using web services and XML messages. Each cell can be developed by independent investigators to achieve specific analytic goals, and then be integrated into the hive to enhance the functionality available in the i2b2 Hive. We have applied this architecture through several ongoing clinical studies and found it to be of high value. The current version of this software has been released into the public domain and is available at the URL-http://www.i2b2.org.
BACKGROUND: Shared Pathology Informatics Network (SPIN) is a tissue resource initiative that utilizes clinical reports of the vast amount of paraffin-embedded tissues routinely stored by medical centers. SPIN has an informatics component (sending tissue-related queries to multiple institutions via the internet) and a service component (providing histopathologically annotated tissue specimens for medical research). This paper examines if tissue blocks, identified by localized computer searches at participating institutions, can be retrieved in adequate quantity and quality to support medical researchers. METHODS: Four centers evaluated pathology reports (1990-2005) for common and rare tumors to determine the percentage of cases where suitable tissue blocks with tumor were available. Each site generated a list of 100 common tumor cases (25 cases each of breast adenocarcinoma, colonic adenocarcinoma, lung squamous carcinoma, and prostate adenocarcinoma) and 100 rare tumor cases (25 cases each of adrenal cortical carcinoma, gastro-intestinal stromal tumor [GIST], adenoid cystic carcinoma, and mycosis fungoides) using a combination of Tumor Registry, laboratory information system (LIS) and/or SPIN-related tools. Pathologists identified the slides/blocks with tumor and noted first 3 slides with largest tumor and availability of the corresponding block. RESULTS: Common tumors cases (n = 400), the institutional retrieval rates (all blocks) were 83% (A), 95% (B), 80% (C), and 98% (D). Retrieval rate (tumor blocks) from all centers for common tumors was 73% with mean largest tumor size of 1.49 cm; retrieval (tumor blocks) was highest-lung (84%) and lowest-prostate (54%).Rare tumors cases (n = 400), each institution's retrieval rates (all blocks) were 78% (A), 73% (B), 67% (C), and 84% (D). Retrieval rate (tumor blocks) from all centers for rare tumors was 66% with mean largest tumor size of 1.56 cm; retrieval (tumor blocks) was highest for GIST (72%) and lowest for adenoid cystic carcinoma (58%). CONCLUSION: Assessment shows availability and quality of archival tissue blocks that are retrievable and associated electronic data that can be of value for researchers. This study serves to compliment the data from which uniform use of the SPIN query tools by all four centers will be measured to assure and highlight the usefulness of archival material for obtaining tumor tissues for research.
Non-adherence to physician recommendations is common and is thought to lead to poor clinical outcomes. However, no techniques exist for a large-scale assessment of this phenomenon. We evaluated a computational approach that quantifies patient non-adherence from an analysis of the text of physician notes. Index of non-adherence (INA) was computed based on the number of non-adherence word tags detected in physician notes. INA was evaluated by comparing the results to a manual patient record review at the individual sentence and patient level. The relationship between INA and frequency of Emergency Department visits was determined. The positive predictive value of identification of individual non-adherence word tags was 93.3%. The Pearson correlation coefficient between the INA and the number of documented instances of non-adherence identified by manual review was 0.62. The frequency of ED visits was more than twice as high for patients with INA in the highest quartile (least adherent) than for patients with INA in the lowest (most adherent) quartile (p < 0.0001). We have described the design and evaluation of a novel approach that allows quantification of patient non-adherence with physician recommendations through an analysis of physician notes. This approach has been validated at several levels and demonstrated to correlate with clinical outcomes.
BACKGROUND: Advanced disease-surveillance systems have been deployed worldwide to provide early detection of infectious disease outbreaks and bioterrorist attacks. New methods that improve the overall detection capabilities of these systems can have a broad practical impact. Furthermore, most current generation surveillance systems are vulnerable to dramatic and unpredictable shifts in the health-care data that they monitor. These shifts can occur during major public events, such as the Olympics, as a result of population surges and public closures. Shifts can also occur during epidemics and pandemics as a result of quarantines, the worried-well flooding emergency departments or, conversely, the public staying away from hospitals for fear of nosocomial infection. Most surveillance systems are not robust to such shifts in health-care utilization, either because they do not adjust baselines and alert-thresholds to new utilization levels, or because the utilization shifts themselves may trigger an alarm. As a result, public-health crises and major public events threaten to undermine health-surveillance systems at the very times they are needed most. METHODS AND FINDINGS: To address this challenge, we introduce a class of epidemiological network models that monitor the relationships among different health-care data streams instead of monitoring the data streams themselves. By extracting the extra information present in the relationships between the data streams, these models have the potential to improve the detection capabilities of a system. Furthermore, the models' relational nature has the potential to increase a system's robustness to unpredictable baseline shifts. We implemented these models and evaluated their effectiveness using historical emergency department data from five hospitals in a single metropolitan area, recorded over a period of 4.5 y by the Automated Epidemiological Geotemporal Integrated Surveillance real-time public health-surveillance system, developed by the Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology on behalf of the Massachusetts Department of Public Health. We performed experiments with semi-synthetic outbreaks of different magnitudes and simulated baseline shifts of different types and magnitudes. The results show that the network models provide better detection of localized outbreaks, and greater robustness to unpredictable shifts than a reference time-series modeling approach. CONCLUSIONS: The integrated network models of epidemiological data streams and their interrelationships have the potential to improve current surveillance efforts, providing better localized outbreak detection under normal circumstances, as well as more robust performance in the face of shifts in health-care utilization during epidemics and major public events.
OBJECTIVES: Neurologic injury after cardiac surgery, often manifested as neurocognitive decline, is a common postoperative complication without clear cause. We studied acute variations in gene-expression profiles of patients with neurocognitive decline (NCD group) compared with those without neurocognitive decline (NORM group) after cardiopulmonary bypass. METHODS: Forty-two patients undergoing coronary artery bypass grafting, valve procedures, or both by using cardiopulmonary bypass were administered a validated neurocognitive battery preoperatively and postoperatively at day 4. Neurocognitive decline was defined as 1 standard deviation from baseline on 25% or greater of tasks. Whole-blood mRNA was isolated preoperatively and at 6 hours after surgical intervention for fold-change calculation. Relative gene expression in the NCD versus the NORM group was assessed by using Affymetrix GeneChip U133 Plus 2.0 (>40,000 genes) from mRNA samples collected. Differential expression, clustering, gene ontology, and canonical pathway analysis were performed. Validation of microarray gene expression was performed with SYBR Green real-time polymerase chain reaction. RESULTS: Patients with neurocognitive decline (17/42 [40.5%] patients) were associated with a significantly different gene-expression response compared with that of healthy patients. Compared with preoperative samples, 6-hour samples had 531 upregulated and 670 downregulated genes uniquely in the NCD group compared with 2214 upregulated and 558 downregulated genes uniquely in the NORM group (P < .001; lower confidence bound, > or =1.2). Compared with patients in the NORM group, patients with neurocognitive decline had significantly different gene-expression pathways involving inflammation (including FAS, IL2RB, and CD59), antigen presentation (including HLA-DQ1, TAP1, and TAP2), and cellular adhesion (including ICAM2, ICAM3, and CAD7) among others. CONCLUSIONS: Patients with neurocognitive decline have inherently different genetic responses to cardiopulmonary bypass compared with those of patients without neurocognitive decline Genetic variations in inflammatory, cell adhesion, and apoptotic pathways might be important contributors to the pathophysiology of neurologic injury after cardiopulmonary bypass and could become a target for prevention and risk stratification.
OBJECTIVE Define a scalable architecture to support the National Health Information Network (NHIN). This architecture must concurrently support a wide range of public health, research, and clinical care activities. DESIGN The architecture fulfils five desiderata: (1) adopt a distributed approach to data storage in order to protect privacy; (2) enable strong institutional autonomy to engender participation; (3) provide oversight and transparency to ensure patient trust; (4) allow variable levels of access according to investigator needs and institutional policies; (5) define a self-scaling architecture that encourages voluntary regional collaborations that coalesce to form a nationwide network. RESULTS Our model has been validated by a large scale, multi-institution study involving seven medical centers for cancer research. It is the basis of one of four open architectures developed under funding from the Office of the National Coordinator of Health Information Technology, fulfilling the biosurveillance use case defined by the American Health Information Community. The model supports broad applicability for regional and national clinical information exchanges. CONCLUSION This model shows the feasibility of an architecture wherein the requirements of care providers, investigators, and public health authorities are served by a distributed model that grants autonomy, protects privacy, and promotes participation.