Publications

2010
Adida B, Sanyal A, Zabak S, Kohane IS, Mandl KD. Indivo x: developing a fully substitutable personally controlled health record platform. AMIA Annu Symp ProcAMIA Annu Symp Proc. 2010;2010 :6-10.Abstract
To support a rich ecosystem of third-party applications around a personally controlled health record (PCHR), we have redesigned Indivo, the original PCHR, as a web-based platform with feature-level substitutability. Core to this new release is the Indivo X Application Programming Interface (API), the contract between the PCHR platform and the end-user apps. Using rapid iterative development to build a minimal feature set from real-world requirements, the resulting Indivo X API, now in public stable beta, is enabling developers, including third-party contributors, to quickly create and integrate novel features into patients' online records, ultimately building a fully customizable experience for diverse patient needs.
Brownstein JS, Murphy SN, Goldfine AB, Grant RW, Sordo M, Gainer V, Colecchi JA, Dubey A, Nathan DM, Glaser JP, et al. Rapid identification of myocardial infarction risk associated with diabetes medications using electronic medical records. Diabetes CareDiabetes CareDiabetes Care. 2010;33 :526-31.Abstract
OBJECTIVE To assess the ability to identify potential association(s) of diabetes medications with myocardial infarction using usual care clinical data obtained from the electronic medical record. RESEARCH DESIGN AND METHODS We defined a retrospective cohort of patients (n = 34,253) treated with a sulfonylurea, metformin, rosiglitazone, or pioglitazone in a single academic health care network. All patients were aged >18 years with at least one prescription for one of the medications between 1 January 2000 and 31 December 2006. The study outcome was acute myocardial infarction requiring hospitalization. We used a cumulative temporal approach to ascertain the calendar date for earliest identifiable risk associated with rosiglitazone compared with that for other therapies. RESULTS Sulfonylurea, metformin, rosiglitazone, or pioglitazone therapy was prescribed for 11,200, 12,490, 1,879, and 806 patients, respectively. A total of 1,343 myocardial infarctions were identified. After adjustment for potential myocardial infarction risk factors, the relative risk for myocardial infarction with rosiglitazone was 1.3 (95% CI 1.1-1.6) compared with sulfonylurea, 2.2 (1.6-3.1) compared with metformin, and 2.2 (1.5-3.4) compared with pioglitazone. Prospective surveillance using these data would have identified increased risk for myocardial infarction with rosiglitazone compared with metformin within 18 months of its introduction with a risk ratio of 2.1 (95% CI 1.2-3.8). CONCLUSIONS Our results are consistent with a relative adverse cardiovascular risk profile for rosiglitazone. Our use of usual care electronic data sources from a large hospital network represents an innovative approach to rapid safety signal detection that may enable more effective postmarketing drug surveillance.
Kho AT, Bhattacharya S, Tantisira KG, Carey VJ, Gaedigk R, Leeder JS, Kohane IS, Weiss ST, Mariani TJ. Transcriptomic analysis of human lung development. American journal of respiratory and critical care medicineAm J Respir Crit Care Med. 2010;181 :54-63.Abstract
RATIONALE: Current understanding of the molecular regulation of lung development is limited and derives mostly from animal studies. OBJECTIVES: To define global patterns of gene expression during human lung development. METHODS: Genome-wide expression profiling was used to measure the developing lung transcriptome in RNA samples derived from 38 normal human lung tissues at 53 to 154 days post conception. Principal component analysis was used to characterize global expression variation and to identify genes and bioontologic attributes contributing to these variations. Individual gene expression patterns were verified by quantitative reverse transcriptase-polymerase chain reaction analysis. MEASUREMENTS AND MAIN RESULTS: Gene expression analysis identified attributes not previously associated with lung development, such as chemokine-immunologic processes. Lung characteristics attributes (e.g., surfactant function) were observed at an earlier-than-anticipated age. We defined a 3,223 gene developing lung characteristic subtranscriptome capable of describing a majority of the process. In gene expression space, the samples formed a time-contiguous trajectory with transition points correlating with histological stages and suggesting the existence of novel molecular substages. Induction of surfactant gene expression characterized a pseudoglandular "molecular phase" transition. Individual gene expression patterns were independently validated. We predicted the age of independent human lung transcriptome profiles with a median absolute error of 5 days, supporting the validity of the data and modeling approach. CONCLUSIONS: This study extends our knowledge of key gene expression patterns and bioontologic attributes underlying early human lung developmental processes. The data also suggest the existence of molecular phases of lung development.
2009
Wall DP, Esteban FJ, Deluca TF, Huyck M, Monaghan T, Velez de Mendizabal N, Goñí J, Kohane I. Comparative analysis of neurological disorders focuses genome-wide search for autism genes. GenomicsGenomics. 2009;93 :120-129.Abstract
The behaviors of autism overlap with a diverse array of other neurological disorders, suggesting common molecular mechanisms. We conducted a large comparative analysis of the network of genes linked to autism with those of 432 other neurological diseases to circumscribe a multi-disorder subcomponent of autism. We leveraged the biological process and interaction properties of these multi-disorder autism genes to overcome the across-the-board multiple hypothesis corrections that a purely data-driven approach requires. Using prior knowledge of biological process, we identified 154 genes not previously linked to autism of which 42% were significantly differentially expressed in autistic individuals. Then, using prior knowledge from interaction networks of disorders related to autism, we uncovered 334 new genes that interact with published autism genes, of which 87% were significantly differentially regulated in autistic individuals. Our analysis provided a novel picture of autism from the perspective of related neurological disorders and suggested a model by which prior knowledge of interaction networks can inform and focus genome-scale studies of complex neurological disorders.
Uno H, Tian L, Cai T, Kohane IS, Wei LJ. Comparing Risk Scoring Systems Beyond the ROC Paradigm in Survival Analysis. Harvard University Biostatistics Working Paper Series. 2009 :107.
Tian Z, An N, Zhou B, Xiao P, Kohane IS, Wu E. Cytotoxic diarylheptanoid induces cell cycle arrest and apoptosis via increasing ATF3 and stabilizing p53 in SH-SY5Y cells. Cancer Chemother PharmacolCancer Chemother Pharmacol. 2009;63 :1131-9.Abstract
PURPOSE: The aim of the study is to dissect the cytotoxic mechanisms of 1-(4-hydroxy-3-methoxyphenyl)-7-(3,4-dihydroxyphenyl)-4E-en-3-heptanone (compound 1) in SH-SY5Y cells and therefore to provide new insight into neuroblastoma chemotherapy. METHODS: Nine diarylheptanoids were isolated from Alpinia officinarum by chromatography and their cytotoxicity was evaluated by an MTS assay. Flow cytometry, BrdU incorporation assay and fluorescence staining were employed to investigate cytostatic and apoptotic effects induced by the compound 1. In addition, Western blot, qPCR and siRNA techniques were used to elucidate the molecular mechanisms of the cytotoxicity. RESULTS: The study to elucidate the cytotoxic mechanisms of compound 1, the most potent diarylheptanoid showed that cell cycle-related proteins, cyclins, CDKs and CDKIs, as well as two main apoptotic related families, caspase and Bcl 2 were involved in S phase arrest and apoptosis in neuroblastoma cell line SH-SY5Y. Furthermore, following the drug treatment, the protein expression of p53, phospho-p53 (Ser20) as well as the p53 transcriptional activated genes ATF3, puma and Apaf-1 were increased dramatically; MDM2 and Aurora A, the two p53 negative regulators were decreased; the p53 protein stability was enhanced, whereas the p53 mRNA expression level slightly decreased and ATF3 mRNA expression apparently increased. In addition, the knockdown of ATF3 gene by siRNA partially suppressed p53, caspase 3, S phase arrest and apoptosis triggered by compound 1. CONCLUSION: These results suggest that compound 1 induces S phase arrest and apoptosis via up regulation of ATF3 and stabilization of p53 in SH-SY5Y cell line. Therefore, compound 1 might be a promising lead structure for neuroblastoma therapy.
Roach KL, King KR, Uygun BE, Kohane IS, Yarmush ML, Toner M. High throughput single cell bioinformatics. Biotechnol ProgBiotechnol Prog. 2009.Abstract
Advances in systems biology and bioinformatics have highlighted that no cell population is truly uniform and that stochastic behavior is an inherent property of many biological systems. As a result, bulk measurements can be misleading even when particular care has been taken to isolate a single cell type, and measurements averaged over multiple cell populations in a tissue can be as misleading as the average height at an elementary school. There is a growing need for experimental techniques that can provide a combination of single cell resolution, large cell populations, and the ability to track cells over multiple time points. In this article, a microwell array cytometry platform was developed to meet this need and investigate the heterogeneity and stochasticity of cell behavior on a single cell basis. The platform consisted of a microfabricated device with high-density arrays of cell-sized microwells and custom software for automated image processing and data analysis. As a model experimental system, we used primary hepatocytes labeled with fluorescent probes sensitive to mitochondrial membrane potential and free radical generation. The cells were exposed to oxidative stress and the responses were dynamically monitored for each cell. The resulting data was then analyzed using bioinformatics techniques such as hierarchical and k-means clustering to visualize the data and identify interesting features. The results showed that clustering of the dynamic data not only enhanced comparisons between the treatment groups but also revealed a number of distinct response patterns within each treatment group. Heatmaps with hierarchical clustering also provided a data-rich complement to survival curves in a dose response experiment. The microwell array cytometry platform was shown to be powerful, easy to use, and able to provide a detailed picture of the heterogeneity present in cell responses to oxidative stress. We believe that our microwell array cytometry platform will have general utility for a wide range of questions related to cell population heterogeneity, biological stochasticity, and cell behavior under stress conditions. (c) 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010.
Roach KL, King KR, Uygun K, Hand SC, Kohane IS, Yarmush ML, Toner M. High-throughput single cell arrays as a novel tool in biopreservation. CryobiologyCryobiology. 2009;58 :315-21.Abstract
Microwell array cytometry is a novel high-throughput experimental technique that makes it possible to correlate pre-stress cell phenotypes and post-stress outcomes with single cell resolution. Because the cells are seeded in a high density grid of cell-sized microwells, thousands of individual cells can be tracked and imaged through manipulations as extreme as freezing or drying. Unlike flow cytometry, measurements can be made at multiple time points for the same set of cells. Unlike conventional image cytometry, image analysis is greatly simplified by arranging the cells in a spatially defined pattern and physically separating them from one another. To demonstrate the utility of microwell array cytometry in the field of biopreservation, we have used it to investigate the role of mitochondrial membrane potential in the cryopreservation of primary hepatocytes. Even with optimized cryopreservation protocols, the stress of freezing almost always leads to dysfunction or death in part of the cell population. To a large extent, cell fate is dominated by the stochastic nature of ice crystal nucleation, membrane rupture, and other biophysical processes, but natural variation in the initial cell population almost certainly plays an important and under-studied role. Understanding why some cells in a population are more likely to survive preservation will be invaluable for the development of new approaches to improve preservation yields. For this paper, primary hepatocytes were seeded in microwell array devices, imaged using the mitochondrial dyes Rh123 or JC-1, cryopreserved for up to a week, rapidly thawed, and checked for viability after a short recovery period. Cells with a high mitochondrial membrane potential before freezing were significantly less likely to survive the freezing process, though the difference in short term viability was fairly small. The results demonstrate that intrinsic cell factors do play an important role in cryopreservation survival, even in the short term where extrinsic biophysical factors would be expected to dominate. We believe that microwell array cytometry will be an important tool for a wide range of studies in biopreservation and stress biology.
Tian Z, Palmer N, Schmid P, Yao H, Galdzicki M, Berger B, Wu E, Kohane IS. A practical platform for blood biomarker study by using global gene expression profiling of peripheral whole blood. PLoS ONEPLoS ONE. 2009;4 :e5157.Abstract
BACKGROUND: Although microarray technology has become the most common method for studying global gene expression, a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to meet clinical requirements for biomarker study. METHODS AND FINDINGS: We applied peripheral whole blood samples with globin reduction and performed genome-wide transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data and elucidate the mode in which hemoglobin interferes in gene expression profiling. We demonstrated that, when applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal. CONCLUSION: We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study.
Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ, Churchill S, Kohane IS. The Shared Health Research Information Network (SHRINE): A prototype federated query tool for clinical data repositories. J Am Med Inform AssocJ Am Med Inform Assoc. 2009;16 :624-30.Abstract
We developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a roadmap for other institutions. We are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.
Oriol NE, Cote PJ, Vavasis AP, Bennet J, Delorenzo D, Blanc P, Kohane I. Calculating the return on investment of mobile healthcare. BMC MedBMC Med. 2009;7 :27.Abstract
BACKGROUND: Mobile health clinics provide an alternative portal into the healthcare system for the medically disenfranchised, that is, people who are underinsured, uninsured or who are otherwise outside of mainstream healthcare due to issues of trust, language, immigration status or simply location. Mobile health clinics as providers of last resort are an essential component of the healthcare safety net providing prevention, screening, and appropriate triage into mainstream services. Despite the face value of providing services to underserved populations, a focused analysis of the relative value of the mobile health clinic model has not been elucidated. The question that the return on investment algorithm has been designed to answer is: can the value of the services provided by mobile health programs be quantified in terms of quality adjusted life years saved and estimated emergency department expenditures avoided? METHODS: Using a sample mobile health clinic and published research that quantifies health outcomes, we developed and tested an algorithm to calculate the return on investment of a typical broad-service mobile health clinic: the relative value of mobile health clinic services = annual projected emergency department costs avoided + value of potential life years saved from the services provided. Return on investment ratio = the relative value of the mobile health clinic services/annual cost to run the mobile health clinic. RESULTS: Based on service data provided by The Family Van for 2008 we calculated the annual cost savings from preventing emergency room visits, $3,125,668 plus the relative value of providing 7 of the top 25 priority prevention services during the same period, US$17,780,000 for a total annual value of $20,339,968. Given that the annual cost to run the program was $567,700, the calculated return on investment of The Family Van was 36:1. CONCLUSION: By using published data that quantify the value of prevention practices and the value of preventing unnecessary use of emergency departments, an empirical method was developed to determine the value of a typical mobile health clinic. The Family Van, a mobile health clinic that has been serving the medically disenfranchised of Boston for 16 years, was evaluated accordingly and found to have return on investment of $36 for every $1 invested in the program.
Eran A, Graham KR, Vatalaro K, McCarthy J, Collins C, Peters H, Brewster SJ, Hanson E, Hundley R, Rappaport L, et al. Comment on "Autistic-like phenotypes in Cadps2-knockout mice and aberrant CADPS2 splicing in autistic patients". J Clin InvestJ Clin Invest. 2009;119 :679-80; author reply 680-1.
Murphy S, Churchill S, Bry L, Chueh H, Weiss S, Lazarus R, Zeng Q, Dubey A, Gainer V, Mendis M, et al. Instrumenting the health care enterprise for discovery research in the genomic era. Genome ResGenome Res. 2009;19 :1675-81.Abstract
Tens of thousands of subjects may be required to obtain reliable evidence relating disease characteristics to the weak effects typically reported from common genetic variants. The costs of assembling, phenotyping, and studying these large populations are substantial, recently estimated at three billion dollars for 500,000 individuals. They are also decade-long efforts. We hypothesized that automation and analytic tools can repurpose the informational byproducts of routine clinical care, bringing sample acquisition and phenotyping to the same high-throughput pace and commodity price-point as is currently true of genome-wide genotyping. Described here is a demonstration of the capability to acquire samples and data from densely phenotyped and genotyped individuals in the tens of thousands for common diseases (e.g., in a 1-yr period: N = 15,798 for rheumatoid arthritis; N = 42,238 for asthma; N = 34,535 for major depressive disorder) in one academic health center at an order of magnitude lower cost. Even for rare diseases caused by rare, highly penetrant mutations such as Huntington disease (N = 102) and autism (N = 756), these capabilities are also of interest.
Park PJ, Kong SW, Tebaldi T, Lai WR, Kasif S, Kohane IS. Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance. BioinformaticsBioinformatics. 2009;25 :3121-3127.Abstract
MOTIVATION: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes Genome Anatomy Project (DGAP) was launched to profile gene expression in a variety of related animal models and human subjects. We asked whether these heterogeneous models can be integrated to provide consistent and robust biological insights into the biology of insulin resistance. RESULTS: We perform integrative analysis of the 16 DGAP data sets that span multiple tissues, conditions, array types, laboratories, species, genetic backgrounds and study designs. For each data set, we identify differentially expressed genes compared with control. Then, for the combined data, we rank genes according to the frequency with which they were found to be statistically significant across data sets. This analysis reveals RetSat as a widely shared component of mechanisms involved in insulin resistance and sensitivity and adds to the growing importance of the retinol pathway in diabetes, adipogenesis and insulin resistance. Top candidates obtained from our analysis have been confirmed in recent laboratory studies. CONTACT: Isaac_kohane@harvard.edu.
Reis B, Kohane I, Mandl K. Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. BMJBMJ. 2009;339 :b3677.
Mandl KD, Kohane IS. No small change for the health information economy. N Engl J MedN Engl J Med. 2009;360 :1278-81.
Himes BE, Dai Y, Kohane IS, Weiss ST, Ramoni MF. Prediction of Chronic Obstructive Pulmonary Disease (COPD) in Asthma Patients Using Electronic Medical Records. J Am Med Inform AssocJ Am Med Inform Assoc. 2009;16 :371-9.Abstract
OBJECTIVE Identify clinical factors that modulate the risk of progression to COPD among asthma patients using data extracted from electronic medical records. DESIGN Demographic information and comorbidities from adult asthma patients who were observed for at least 5 years with initial observation dates between 1988 and 1998, were extracted from electronic medical records of the Partners Healthcare System using tools of the National Center for Biomedical Computing "Informatics for Integrating Biology to the Bedside" (i2b2). MEASUREMENTS A predictive model of COPD was constructed from a set of 9,349 patients (843 cases, 8,506 controls) using Bayesian networks. The model's predictive accuracy was tested using it to predict COPD in a future independent set of asthma patients (992 patients; 46 cases, 946 controls), who had initial observation dates between 1999 and 2002. RESULTS A Bayesian network model composed of age, sex, race, smoking history, and 8 comorbidity variables is able to predict COPD in the independent set of patients with an accuracy of 83.3%, computed as the area under the Receiver Operating Characteristic curve (AUROC). CONCLUSIONS Our results demonstrate that data extracted from electronic medical records can be used to create predictive models. With improvements in data extraction and inclusion of more variables, such models may prove to be clinically useful.
Kohane I. Ten thousand views of bioinformatics: a bibliome perspective. Yearb Med InformYearb Med Inform. 2009 :113-6.Abstract
OBJECTIVE: Summarize the current state bioinformatics research from the published literature in 2008. METHODS: The entire corpus of publications indexed by the National Library of Medicine in the PubMed repository was reviewed for articles tagged as belonging to the discipline of bioinformatics by Medical Subject Heading or by term in the title or abstract of the article. Selected summary statistics of this corpus were then used to motivate additional exploration. RESULTS: Over ten thousand articles published in 2008 populated the bioinformatics corpus. Significantly, there were at least as many publications in genomics and genetics that used computational techniques but that were not identified as bioinformatics research. Genomics and proteomics continued to be the leading application domains of bioinformatics research but despite the proliferation of human studies, the genes most studied in the corpus were from yeast rather than the human organism. The growth in the genomic studies of human disease was accompanied by a growing critical literature regarding the methods, results and impact of these studies. Concurrently, the availability of full genome sequences at commodity prices has increased the computational challenges of human studies by several orders of magnitude. Further concerns were raised about the consequences of public disclosure of comprehensive or even aggregate genomic data. CONCLUSION: The impressive size of the bioinformatics bibliome is easily dwarfed by the challenges generated by the continued increased growth of high-throughput biological data sets. The demand for bioinformatics expertise and tools is therefore likely to continue to increase, at least in the near term.
Liu H, Kohane IS. Tissue and process specific microRNA-mRNA co-expression in mammalian development and malignancy. PLoS ONEPLoS ONE. 2009;4 :e5436.Abstract
An association between enrichment and depletion of microRNA (miRNA) binding sites, 3' UTR length, and mRNA expression has been demonstrated in various developing tissues and tissues from different mature organs; but functional, context-dependent miRNA regulations have yet to be elucidated. Towards that goal, we examined miRNA-mRNA interactions by measuring miRNA and mRNA in the same tissue during development and also in malignant conditions. We identified significant miRNA-mediated biological process categories in developing mouse cerebellum and lung using non-targeted mRNA expression as the negative control. Although miRNAs in general suppress target mRNA messages, many predicted miRNA targets demonstrate a significantly higher level of co-expression than non-target genes in developing cerebellum. This phenomenon is tissue specific since it is not observed in developing lungs. Comparison of mouse cerebellar development and medulloblastoma demonstrates a shared miRNA-mRNA co-expression program for brain-specific neurologic processes such as synaptic transmission and exocytosis, in which miRNA target expression increases with the accumulation of multiple miRNAs in developing cerebellum and decreases with the loss of these miRNAs in brain tumors. These findings demonstrate the context-dependence of miRNA-mRNA co-expression.
Kohane IS. The twin questions of personalized medicine: who are you and whom do you most resemble?. Genome MedGenome Med. 2009;1 :4.Abstract
ABSTRACT: Personalized medicine is typically described as the use of molecular or genetic characteristics to customize therapy. This perspective at best provides an incomplete model of the patient and at worst can lead to grossly inappropriate practices. Personalization of medicine requires two characterizations: a well-grounded understanding of who the patient is and an equally robust understanding of the subpopulation that most resembles that patient in the context of the decisions at hand. These characterizations are readily represented probabilistically and can be used to drive decision-making in a rational manner that maximizes the positive outcomes for the patient.

Pages