Publications

2015
McIver DJ, Hawkins JB, Chunara R, Chatterjee AK, Bhandari A, Fitzgerald TP, Jain SH, Brownstein JS. Characterizing Sleep Issues Using Twitter. J Med Internet Res. 2015;17 (6) :e140.Abstract
BACKGROUND: Sleep issues such as insomnia affect over 50 million Americans and can lead to serious health problems, including depression and obesity, and can increase risk of injury. Social media platforms such as Twitter offer exciting potential for their use in studying and identifying both diseases and social phenomenon. OBJECTIVE: Our aim was to determine whether social media can be used as a method to conduct research focusing on sleep issues. METHODS: Twitter posts were collected and curated to determine whether a user exhibited signs of sleep issues based on the presence of several keywords in tweets such as insomnia, "can't sleep", Ambien, and others. Users whose tweets contain any of the keywords were designated as having self-identified sleep issues (sleep group). Users who did not have self-identified sleep issues (non-sleep group) were selected from tweets that did not contain pre-defined words or phrases used as a proxy for sleep issues. RESULTS: User data such as number of tweets, friends, followers, and location were collected, as well as the time and date of tweets. Additionally, the sentiment of each tweet and average sentiment of each user were determined to investigate differences between non-sleep and sleep groups. It was found that sleep group users were significantly less active on Twitter (P=.04), had fewer friends (P<.001), and fewer followers (P<.001) compared to others, after adjusting for the length of time each user's account has been active. Sleep group users were more active during typical sleeping hours than others, which may suggest they were having difficulty sleeping. Sleep group users also had significantly lower sentiment in their tweets (P<.001), indicating a possible relationship between sleep and pyschosocial issues. CONCLUSIONS: We have demonstrated a novel method for studying sleep issues that allows for fast, cost-effective, and customizable data to be gathered.
Jain SH, Powers BW, Hawkins JB, Brownstein JS. The digital phenotype. Nat Biotechnol. 2015;33 (5) :462-3.
Hawkins JB, Brownstein JS, Tuli G, Runels T, Broecker K, Nsoesie EO, McIver DJ, Rozenblum R, Wright A, Bourgeois FT, et al. Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Qual Saf. 2015.Abstract
BACKGROUND: Patients routinely use Twitter to share feedback about their experience receiving healthcare. Identifying and analysing the content of posts sent to hospitals may provide a novel real-time measure of quality, supplementing traditional, survey-based approaches. OBJECTIVE: To assess the use of Twitter as a supplemental data stream for measuring patient-perceived quality of care in US hospitals and compare patient sentiments about hospitals with established quality measures. DESIGN: 404 065 tweets directed to 2349 US hospitals over a 1-year period were classified as having to do with patient experience using a machine learning approach. Sentiment was calculated for these tweets using natural language processing. 11 602 tweets were manually categorised into patient experience topics. Finally, hospitals with ≥50 patient experience tweets were surveyed to understand how they use Twitter to interact with patients. KEY RESULTS: Roughly half of the hospitals in the US have a presence on Twitter. Of the tweets directed toward these hospitals, 34 725 (9.4%) were related to patient experience and covered diverse topics. Analyses limited to hospitals with ≥50 patient experience tweets revealed that they were more active on Twitter, more likely to be below the national median of Medicare patients (p<0.001) and above the national median for nurse/patient ratio (p=0.006), and to be a non-profit hospital (p<0.001). After adjusting for hospital characteristics, we found that Twitter sentiment was not associated with Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) ratings (but having a Twitter account was), although there was a weak association with 30-day hospital readmission rates (p=0.003). CONCLUSIONS: Tweets describing patient experiences in hospitals cover a wide range of patient care aspects and can be identified using automated approaches. These tweets represent a potentially untapped indicator of quality and may be valuable to patients, researchers, policy makers and hospital administrators.
Souilmi Y, Lancaster AK, Jung J-Y, Rizzo E, Hawkins JB, Powles R, Amzazi S, Ghazal H, Tonellato PJ, Wall DP. Scalable and cost-effective NGS genotyping in the cloud. BMC Med Genomics. 2015;8 (1) :64.Abstract
BACKGROUND: While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. RESULTS: We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. CONCLUSIONS: Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.
2014
Gafni E, Luquette LJ, Lancaster AK, Hawkins JB, Jung J-Y, Souilmi Y, Wall DP, Tonellato PJ. COSMOS: Python library for massively parallel workflows. Bioinformatics. 2014;30 (20) :2956-8.Abstract
SUMMARY: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. AVAILABILITY AND IMPLEMENTATION: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. CONTACT: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
2013
Hawkins JB, Delgado-Eckert E, Thorley-Lawson DA, Shapiro M. The cycle of EBV infection explains persistence, the sizes of the infected cell populations and which come under CTL regulation. PLoS Pathog. 2013;9 (10) :e1003685.Abstract
Previous analysis of Epstein-Barr virus (EBV) persistent infection has involved biological and immunological studies to identify and quantify infected cell populations and the immune response to them. This led to a biological model whereby EBV infects and activates naive B-cells, which then transit through the germinal center to become resting memory B-cells where the virus resides quiescently. Occasionally the virus reactivates from these memory cells to produce infectious virions. Some of this virus infects new naive B-cells, completing a cycle of infection. What has been lacking is an understanding of the dynamic interactions between these components and how their regulation by the immune response produces the observed pattern of viral persistence. We have recently provided a mathematical analysis of a pathogen which, like EBV, has a cycle of infected stages. In this paper we have developed biologically credible values for all of the parameters governing this model and show that with these values, it successfully recapitulates persistent EBV infection with remarkable accuracy. This includes correctly predicting the observed patterns of cytotoxic T-cell regulation (which and by how much each infected population is regulated by the immune response) and the size of the infected germinal center and memory populations. Furthermore, we find that viral quiescence in the memory compartment dictates the pattern of regulation but is not required for persistence; it is the cycle of infection that explains persistence and provides the stability that allows EBV to persist at extremely low levels. This shifts the focus away from a single infected stage, the memory B-cell, to the whole cycle of infection. We conclude that the mathematical description of the biological model of EBV persistence provides a sound basis for quantitative analysis of viral persistence and provides testable predictions about the nature of EBV-associated diseases and how to curb or prevent them.
Thorley-Lawson DA, Hawkins JB, Tracy SI, Shapiro M. The pathogenesis of Epstein-Barr virus persistent infection. Curr Opin Virol. 2013;3 (3) :227-32.Abstract
Epstein-Barr virus (EBV) maintains a lifelong infection. According to the germinal center model (GCM), latently infected B cells transit the germinal center (GC) to become resting memory cells. Here, the virus resides quiescently, occasionally reactivating to infect new B cells, completing the cycle of infection. The GCM remains the only model that explains EBV biology and the pathogenesis of lymphoma. Recent work suggests modifications to the model notably that the virus contributes only modestly to the GC process and predictions from mathematical models that quiescence within memory B cells shapes the overall structure of viral infection but is not essential for persistence. Rather, it is the cycle of infection which allows viral persistence at the very low levels observed.
2011
Hawkins JB, Jones MT, Plassmann PE, Thorley-Lawson DA. Chemotaxis in densely populated tissue determines germinal center anatomy and cell motility: a new paradigm for the development of complex tissues. PLoS One. 2011;6 (12) :e27650.Abstract
Germinal centers (GCs) are complex dynamic structures that form within lymph nodes as an essential process in the humoral immune response. They represent a paradigm for studying the regulation of cell movement in the development of complex anatomical structures. We have developed a simulation of a modified cyclic re-entry model of GC dynamics which successfully employs chemotaxis to recapitulate the anatomy of the primary follicle and the development of a mature GC, including correctly structured mantle, dark and light zones. We then show that correct single cell movement dynamics (including persistent random walk and inter-zonal crossing) arise from this simulation as purely emergent properties. The major insight of our study is that chemotaxis can only achieve this when constrained by the known biological properties that cells are incompressible, exist in a densely packed environment, and must therefore compete for space. It is this interplay of chemotaxis and competition for limited space that generates all the complex and biologically accurate behaviors described here. Thus, from a single simple mechanism that is well documented in the biological literature, we can explain both higher level structure and single cell movement behaviors. To our knowledge this is the first GC model that is able to recapitulate both correctly detailed anatomy and single cell movement. This mechanism may have wide application for modeling other biological systems where cells undergo complex patterns of movement to produce defined anatomical structures with sharp tissue boundaries.
2009
Cosmopoulos K, Pegtel M, Hawkins J, Moffett H, Novina C, Middeldorp J, Thorley-Lawson DA. Comprehensive profiling of Epstein-Barr virus microRNAs in nasopharyngeal carcinoma. J Virol. 2009;83 (5) :2357-67.Abstract
Epstein-Barr Virus (EBV) establishes a long-term latent infection and is associated with a number of human malignancies that are thought to arise from deregulation of different stages of the viral life cycle. Recently, a large number of microRNAs (miRNAs) have been described for EBV, and it has been suggested that their expression may vary between the different latency states found in normal and malignant tissue. To date, however, no technique has been utilized to comprehensively and quantitatively test this idea by profiling expression of the EBV miRNAs in primary infected tissues. We describe here a multiplex reverse transcription-PCR assay that allows the profiling of 39 of the 40 known mature EBV miRNAs from as little as 250 ng of RNA. With this approach, we present a comprehensive profile of EBV miRNAs in primary nasopharyngeal carcinoma (NPC) tumors including estimates of miRNA copy number per tumor cell. This is the first comprehensive profiling of EBV miRNAs in any EBV-associated tumor. In contrast to previous suggestions, we show that the BART-derived miRNAs are present in a wide range of copy numbers from < or =10(3) per cell in both primary tumors and the widely used NPC-derived C666-1 cell line. However, we confirm the hypothesis that the BHRF1 miRNAs are not expressed in NPC. Lastly, we demonstrate that EBV miRNA expression in the widely used NPC line C666-1 is, with some caveats, broadly representative of primary NPC tumors.
2008
Shapiro M, Duca KA, Lee K, Delgado-Eckert E, Hawkins J, Jarrah AS, Laubenbacher R, Polys NF, Hadinoto V, Thorley-Lawson DA. A virtual look at Epstein-Barr virus infection: simulation mechanism. J Theor Biol. 2008;252 (4) :633-48.Abstract
Epstein-Barr virus (EBV) is an important human pathogen that establishes a life-long persistent infection and for which no precise animal model exists. In this paper, we describe in detail an agent-based model and computer simulation of EBV infection. Agents representing EBV and sets of B and T lymphocytes move and interact on a three-dimensional grid approximating Waldeyer's ring, together with abstract compartments for lymph and blood. The simulation allows us to explore the development and resolution of virtual infections in a manner not possible in actual human experiments. Specifically, we identify parameters capable of inducing clearance, persistent infection, or death.