Publications

2001
Butte AJ, Bao L, Reis BY, Watkins TW, Kohane IS. Comparing the similarity of time-series gene expression using signal processing metrics. J Biomed InformJ Biomed Inform. 2001;34 :396-405.Abstract
Many algorithms have been used to cluster genes measured by microarray across a time series. Instead of clustering, our goal was to compare all pairs of genes to determine whether there was evidence of a phase shift between them. We describe a technique where gene expression is treated as a discrete time-invariant signal, allowing the use of digital signal-processing tools, including power spectral density, coherence, and transfer gain and phase shift. We used these on a public RNA expression set of 2467 genes measured every 7 min for 119 min and found 18 putative associations. Two of these were known in the biomedical literature and may have been missed using correlation coefficients. Digital signal processing tools can be embedded and enhance existing clustering algorithms.
Gonzalez-Heydrich J, Steingard RJ, Putnam FW, De Bellis MD, Beardslee W, Kohane IS. Corticotropin releasing hormone increases apparent potency of adrenocorticotropic hormone stimulation of cortisol secretion. Med HypothesesMed Hypotheses. 2001;57 :544-8.Abstract
HYPOTHESIS: Corticotropin releasing hormone (CRH) has a regulatory effect on cortisol secretion in addition to its classic effect of stimulating adrenocorticotropic hormone (ACTH) secretion. REVIEW: There is growing evidence of "long-loop" and paracrine adrenal stimulation by CRH. Data from a study of the ovine-corticotropin releasing hormone (oCRH) stimulation test in 13 sexually abused girls and 13 normal controls was used in Montecarlo simulations of the hypothalamic-pituitary-adrenal axis, to get estimates of adrenal sensitivity to ACTH and cortisol elimination kinetics before and after oCRH administration. In both controls and sexually abused girls, ACTH had an apparent greater effect on cortisol secretion after administration of oCRH compared to its effect during the baseline period. This lends support to the hypothesis and suggests that it should be tested experimentally.
Butte AJ, Ye J, Haring HU, Stumvoll M, White MF, Kohane IS. Determining significant fold differences in gene expression analysis. Pac Symp BiocomputPac Symp Biocomput. 2001 :6-17.Abstract
A typical use for RNA expression microarrays is comparing the measurement of gene expression of two groups. There has not been a study reproducing an entire experiment and modeling the distribution of reproducibility of fold differences. Our goal was to create a model of significance for fold differences, then maximize the number of ESTs above that threshold. Multiple strategies were tested to filter out those ESTs contributing to noise, thus decreasing the requirements of what was needed for significance. We found that even though RNA expression levels appears consistent in duplicate measurements, when entire experiments are duplicated, the calculated fold differences are not as consistent. Thus, it is critically important to repeat as many data points as possible, to ensure that genes and ESTs labeled as significant are truly so. We were successfully able to use duplicated expression measurements to model the duplicated fold differences, and to calculate the levels of fold difference needed to reach significance. This approach can be applied to many other experiments to ascertain significance without a priori assumptions.
Kohane IS. The ever imminent electronic medical record. J Med Pract ManageJ Med Pract Manage. 2001;16 :264-5.Abstract
The electronic medical record (EMR) remains an elusive holy grail. The reasons include limited electronic and voice recognition capabilities, as well as established medical practice patterns. There is also some question as to the time and cost efficacy of inputs into an EMR.
Kim JH, Ohno-Machado L, Kohane IS. Unsupervised learning from complex data: the matrix incision tree algorithm. Pac Symp BiocomputPac Symp Biocomput. 2001 :30-41.Abstract
Analysis of large-scale gene expression data requires novel methods for knowledge discovery and predictive model building as well as clustering. Organizing data into meaningful structures is one of the most fundamental modes of learning. DNA microarray data set can be viewed as a set of mutually associated genes in a high-dimensional space. This paper describes a novel method to organize a complex high-dimensional space into successive lower-dimensional spaces based on the geometric properties of the data structure in the absence of a priori knowledge. The matrix incision tree algorithm reveals the hierarchical structural organization of observed data by determining the successive hyperplanes that 'optimally' separate the data hyperspace. The algorithm was tested against published data sets yielding promising results.
2000
Nigrin DJ, Kohane IS. Glucoweb: a case study of secure, remote biomonitoring and communication. Proc AMIA SympProc AMIA Symp. 2000 :610-4.Abstract
As the Internet begins to play a greater role in many healthcare processes, it is inevitable that remote monitoring of patients' physiological parameters over the Internet will become increasingly commonplace. Internet-based communication between patients and their healthcare providers has already become prevalent, and has gained significant attention in terms of confidentiality issues. However, transmission of data directly from patients' physiological biomonitoring devices over the Web has garnered significantly less focus, especially in the area of authentication and security. In this paper, we describe a prototype system called Glucoweb, which allows patients with diabetes mellitus to transmit their self-monitored blood glucose data directly from their personal glucometer device to their diabetes care provider over the Internet. No customized software is necessary on the patient's computer, only a Web browser and active Internet connection. We use this example to highlight key authentication and security measures that should be considered for devices that transmit healthcare data to remote locations.
Tsien CL, Kohane IS, McIntosh N. Multiple signal integration by decision tree induction to detect artifacts in the neonatal intensive care unit. Artif Intell MedArtif Intell Med. 2000;19 :189-202.Abstract
The high incidence of false alarms in the intensive care unit (ICU) necessitates the development of improved alarming techniques. This study aimed to detect artifact patterns across multiple physiologic data signals from a neonatal ICU using decision tree induction. Approximately 200 h of bedside data were analyzed. Artifacts in the data streams were visually located and annotated retrospectively by an experienced clinician. Derived values were calculated for successively overlapping time intervals of raw values, and then used as feature attributes for the induction of models trying to classify 'artifact' versus 'not artifact' cases. The results are very promising, indicating that integration of multiple signals by applying a classification system to sets of values derived from physiologic data streams may be a viable approach to detecting artifacts in neonatal ICU data.
Tsien CL, Kohane IS, McIntosh N. Multiple signal integration by decision tree induction to detect artifacts in the neonatal intensive care unit [In Process Citation]. Artif Intell MedArtif Intell Med. 2000;19 :189-202.Abstract
The high incidence of false alarms in the intensive care unit (ICU) necessitates the development of improved alarming techniques. This study aimed to detect artifact patterns across multiple physiologic data signals from a neonatal ICU using decision tree induction. Approximately 200 h of bedside data were analyzed. Artifacts in the data streams were visually located and annotated retrospectively by an experienced clinician. Derived values were calculated for successively overlapping time intervals of raw values, and then used as feature attributes for the induction of models trying to classify 'artifact' versus 'not artifact' cases. The results are very promising, indicating that integration of multiple signals by applying a classification system to sets of values derived from physiologic data streams may be a viable approach to detecting artifacts in neonatal ICU data.
Porter SC, Silvia MT, Fleisher GR, Kohane IS, Homer CJ, Mandl KD. Parents as direct contributors to the medical record: validation of their electronic input. Ann Emerg MedAnn Emerg Med. 2000;35 :346-52.Abstract
STUDY OBJECTIVES: We assessed the validity and completeness of data in the past medical history (PMH) obtained electronically from parents and examined effects of the human-computer interface and sociodemographic variables on electronic parental report. METHODS: We compared parents' electronic report of PMH data with a criterion standard, structured face-to-face interview by a pediatrician blinded to the electronic data. The electronic medical record interface enabled parents to provide 5 elements of the PMH: birth status, allergies, current medications, immunization status, and previous hospitalizations. The setting was the emergency department waiting room in an academic, urban children's hospital; parents of infants up to 12 months old participated. Outcome measures were validity of the PMH data obtained using the electronic medical record interface and odds of having an invalid or incomplete response using the electronic medical record interface. RESULTS: One hundred parents were enrolled (69.4% of eligible subjects). Study subjects did not differ from nonenrollees on demographic variables and visit characteristics. The validity of the electronic medical record interface data was high across the PMH elements (94% to 99%). Two demographic features predicted invalid response: parental primary language other than English or Spanish (odds ratio [OR] 11.4, 95% confidence interval CI 1.7 to 76.3), and Asian ethnicity (OR 14. 6, 95% CI 1.2 to 182.4). Incomplete responses were predicted by limited previous experience with computers; computer-naive subjects had an eightfold increased odds of skipping a question (OR 7.9, 95% CI 1.8 to 34.6). CONCLUSION: Parents are accurate independent reporters of their infants' general PMH using the electronic medical record interface. Their participation in care may be enhanced by allowing them to contribute medical information directly to the electronic medical record.
Nigrin DJ, Kohane IS. Temporal expressiveness in querying a time-stamp--based clinical database. J Am Med Inform AssocJ Am Med Inform Assoc. 2000;7 :152-63.Abstract
Most health care databases include time-stamped instant data as the only temporal representation of patient information. Many previous efforts have attempted to provide frameworks in which medical databases could be queried in relation to time. These, however, have required either a sophisticated database representation of time, including time intervals, or a time-stamp-based database coupled with a nonstandard temporal query language. In this work, the authors demonstrate how their previously described data retrieval application, DXtractor, can be used as a database querying application with expressive power close to that of temporal databases and temporal query languages, using only standard SQL and existing time-stamp-based repositories. DXtractor provides the ability to compose temporal queries through an interface that is understood by nonprogramming medical personnel. Not all temporal constructs are easily implemented using this framework; nonetheless, DXtractor's temporal capabilities provide a significant improvement in the temporal expressivity accessible to clinicians using standard time-stamped clinical databases.
Kohane IS. Bioinformatics and clinical informatics: the imperative to collaborate [comment] [editorial]. Journal of the American Medical Informatics AssociationJournal of the American Medical Informatics Association. 2000;7 :512-6.
Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci U S AProc Natl Acad Sci U S A. 2000;97 :12182-6.Abstract
In an effort to find gene regulatory networks and clusters of genes that affect cancer susceptibility to anticancer agents, we joined a database with baseline expression levels of 7,245 genes measured by using microarrays in 60 cancer cell lines, to a database with the amounts of 5,084 anticancer agents needed to inhibit growth of those same cell lines. Comprehensive pair-wise correlations were calculated between gene expression and measures of agent susceptibility. Associations weaker than a threshold strength were removed, leaving networks of highly correlated genes and agents called relevance networks. Hypotheses for potential single-gene determinants of anticancer agent susceptibility were constructed. The effect of random chance in the large number of calculations performed was empirically determined by repeated random permutation testing; only associations stronger than those seen in multiply permuted data were used in clustering. We discuss the advantages of this methodology over alternative approaches, such as phylogenetic-type tree clustering and self-organizing maps.
Butte AJ, Weinstein DA, Kohane IS. Enrolling patients into clinical trials faster using RealTime Recuiting. Proc AMIA SympProc AMIA Symp. 2000 :111-5.Abstract
Previous work has been done on both optimizing the clinical trials process, and on sending critical laboratory results and decision support through paging systems. We report the first integration of both these solution, focusing on improving the clinical trial recruitment process. We describe a clinical trial needing a real-time method of recruiting patients in an unbiased manner, quickly enough that study tests can be obtained before patients leave or samples discarded. The report describes how the ten currently recruited patients were found and how diagnoses of potentially life-threatening disorders are being made.
Mandl KD, Feit, S, Pena, BMG, Kohane IS. Growth and determinants of access in patient e-mail and Internet use. Arch Pediatr Adolesc MedArch Pediatr Adolesc Med. 2000;154 :508-511.
Gonzalez-Heydrich J, DeMaso DR, Irwin C, Steingard RJ, Kohane IS, Beardslee WR. Implementation of an electronic medical record system in a pediatric psychopharmacology program. Int J Med InformInt J Med Inform. 2000;57 :109-16.Abstract
The design, implementation, and utilization of an electronic medical record system (EMRS) in a pediatric psychopharmacology clinic is described. The EMRS is a relational database with information entered directly by the clinician during a patient visit. It has been used during more than 2590 patient visits with 805 patients. Complete clinical documentation and simultaneous data entry as well as computer generated prescriptions for the patient were accomplished 75% of the time within a 20-min medication management session. One hundred consecutive parents of patients were asked to fill out a five-question survey to begin to assess the impact of the application. Of the 87 parents who responded, all (100%) noted that the doctor paid attention to their concerns. Between 88 and 90% of the parents reported that the use of the computer is a 'good' thing, made it easier to work with the doctor, and that they understood why the computer was being used. The findings support that the development and implementation of an EMRS with direct clinician information entry within pediatric psychopharmacology clinic, is feasible.
Butte AJ, Kohane IS. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp BiocomputPac Symp Biocomput. 2000 :418-29.Abstract
Increasing numbers of methodologies are available to find functional genomic clusters in RNA expression data. We describe a technique that computes comprehensive pair-wise mutual information for all genes in such a data set. An association with a high mutual information means that one gene is non-randomly associated with another; we hypothesize this means the two are related biologically. By picking a threshold mutual information and using only associations at or above the threshold, we show how this technique was used on a public data set of 79 RNA expression measurements of 2,467 genes to construct 22 clusters, or Relevance Networks. The biological significance of each Relevance Network is explained.
Kohane IS, Altman RB. The new peer review. Proceedings of the Annual American Medical Informatics Association SymposiumProceedings of the Annual American Medical Informatics Association Symposium. 2000;(20 :433-7.Abstract
It is widely recognized that the Internet has fundamentally changed the dynamics of publication, and in particular, it is clear that there is no effective way to control the release of any web-based publication. The scientific and lay literature is now accessible to the public with unprecedented ease Recent proposals to start a life sciences online repository of preprints highlights the trend towards "publish first, review later" that seems to be emerging. Does this mean that the peer review process is dead? It certainly suggests that there is a need for a change in how the process works. We discuss currently available technologies to enable the implementation of new, distributed peer review process benefiting multiple user communities.
Greenes DS, Fleisher GR, Kohane I. Potential impact of a computerized system to report late-arriving laboratory results in the emergency department. Pediatr Emerg CarePediatr Emerg Care. 2000;16 :313-5.Abstract
BACKGROUND: Results of some laboratory tests for Emergency Department (ED) patients return hours to days after the patient is discharged. Inadequate follow-up for these late-arriving results poses medical and legal risks. We have developed, but not yet implemented, a computerized system called the Automated Late-Arriving Results Monitoring System (ALARMS). ALARMS scans the hospital's laboratory and ED registration databases to generate an electronic daily log of all late-arriving abnormal laboratory results for ED patients. OBJECTIVE: To determine the potential impact of ALARMS by assessing our ED's current quality of documented follow-up of late-arriving laboratory results. METHODS: We applied ALARMS retrospectively, to find all abnormal late-arriving laboratory results returned between 5/1/96 and 4/30/98 for ED patients for the following three tests: serum lead levels, Chlamydia cultures, or urine pregnancy tests. Medical records were reviewed for documentation of follow-up, which was considered appropriate if a clinician noted the abnormal result and documented a follow-up plan within 1 week after the result became available. Medical records were also reviewed for any evidence of complications attributable to delayed or inadequate follow-up. RESULTS: Over the 2-year study period, no appropriate follow-up was documented in 6/18 (33%) cases of elevated lead levels, 3/4 (75%) cases of late-arriving positive pregnancy tests, and 23/39 (59%) cases of positive Chlamydia cultures. One case of a positive Chlamydia culture, for which there was no documented follow-up, was associated with subsequent development of pelvic inflammatory disease. CONCLUSION: Our current system of documented follow-up for late-arriving laboratory results has deficiencies. ALARMS, a computerized system of alerts for emergency physicians, has the potential to substantially improve documented follow-up of late-arriving laboratory results in the ED.
1999
Mandl KD, Kohane IS. Healthconnect: clinical grade patient-physician communication. Proc AMIA SympProc AMIA Symp. 1999 :849-53.Abstract
A critical mass of Internet users is leading to a wide diffusion of electronic communications within medical practice. Unless implemented with substantial forethought, these new technological linkages could disturb delicate balances in the doctor-patient relationship, threaten the privacy of medical information, widen social disparity in health outcomes, and even function as barriers to access. The American Medical Informatics Association (AMIA) recently published recommendations to guide computer-based communications between clinicians and patients. This paper describes the motivations for and the design of HealthConnect, a web-based patient-doctor communications tool currently in use at Children's Hospital, Boston. Structural and process-oriented features of HealthConnect, as they relate to promotion of adherence with the Guidelines, are discussed.
Nigrin DJ, Kohane IS. Scaling a data retrieval and mining application to the enterprise-wide level. Proc AMIA SympProc AMIA Symp. 1999 :901-5.Abstract
Most medical institutions have had difficulty in adopting practices that use stored clinical and administrative data effectively. This stems in part from the lack of available tools to easily and accurately retrieve datasets of interest. In this work, we describe the development of a data retrieval and mining application, Goldminer, which allows authorized personnel at our institution to query clinical and demographic data stores through a graphical, non-programmer interface. It builds upon DXtractor, our previously described tool that retrieves data from a smaller, more specialized dataset. We discuss the difficulties encountered in scaling this application to the enterprise-wide level, and our solutions.

Pages