Dengue virus is spread through mosquitoes in many tropical and subtropical parts of the world, including Brazil. Each year, dengue virus causes seasonal outbreaks that vary in magnitude and timing across the country. This variation makes tailoring preparation efforts for fine spatio-temporal resolutions challenging. In this study, we described four properties of historical dengue time series at the mesoregion level, the Brazilian subdivision below state, and examined how they varied across the country. We found that the duration and timing of seasonal outbreaks are largely driven by climate factors, while relational properties, i.e., the similarity in outbreak timing and magnitude between two mesoregions, are explained by a mix of mobility patterns and climate similarities. Surprisingly, we found that remote sensing derived products and movement inferred through Twitter were adequate proxies for climate and mobility patterns respectively. Knowledge of how dengue outbreaks differ across the country and the factors that may influence specific outbreak properties may be important for improving efforts to build forecasting and prediction models.
Effective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by 1 to 3 weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the 12 continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.
Effectively designing and evaluating public health responses to the ongoing COVID-19 pan-demic requires accurate estimation of the prevalence of COVID-19 across the United States(US). Equipment shortages and varying testing capabilities have however hindered the useful-ness of the official reported positive COVID-19 case counts. We introduce four complementaryapproaches to estimate the cumulative incidence of symptomatic COVID-19 in each state inthe US as well as Puerto Rico and the District of Columbia, using a combination of excessinfluenza-like illness reports, COVID-19 test statistics, COVID-19 mortality reports, and a spatially structured epidemic model. Instead of relying on the estimate from a single data source or method that may be biased, we provide multiple estimates, each relying on different assumptions and data sources. Across our four approaches emerges the consistent conclusion that on April 4, 2020, the estimated case count was 5 to 50 times higher than the official positive testcounts across the different states. Nationally, our estimates of COVID-19 symptomatic cases asof April 4 have a likely range of 2.3 to 4.8 million, with possibly as many as 7.6 million cases,up to 25 times greater than the cumulative confirmed cases of about 311,000. Extending ourmethods to May 16, 2020, we estimate that cumulative symptomatic incidence ranges from 4.9 to 10.1 million, as opposed to 1.5 million positive test counts. The proposed combination ofapproaches may prove useful in assessing the burden of COVID-19 during resurgences in the US and other countries with comparable surveillance systems.
Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional healthcare-based surveil- lance systems are limited by inherent reporting delays. Time-series machine learning methods have the potential to fill this temporal “data gap,” but work to date in this area has focused on relatively simple methods and coarse geographic granularities (state-level and above). We evaluate the performance of a recurrent neural network (gated recurrent unit, or GRU) in comparison to baseline machine learning methods for estimating influenza activity in the US on the state- and city-level, and experiment with the inclusion of real-time search data from Google trends. We find that the GRU improves upon baseline models for long time horizons of prediction but is not improved by real-time Internet search data. We conduct a thorough analysis of feature importance in all considered models for interpretability purposes.
Transmission of dengue fever depends on a complex interplay of human, climate, and mosquito dynamics, which often change in time and space. It is well known that disease dynamics are highly influenced by a population’s susceptibility to infection and microclimates, small-area climatic conditions which create environments favorable for the breeding and survival of the mosquito vector. Here, we present a novel machine learning dengue forecasting approach, which, dynamically in time and adaptively in space, identifies local patterns in weather and population susceptibility to make epidemic predictions at the city-level in Brazil, months ahead of the occurrence of disease outbreaks. Weather-based predictions are improved when information on population susceptibility is incorporated, indicating that immunity is an important predictor neglected by most dengue forecast models. Given the generalizability of our methodology, it may prove valuable for public-health decision making aimed at mitigating the effects of seasonal dengue outbreaks in locations globally.
Background: Accute gastroenteritis (AG) is a major public health issue. To reduce impact and to organize adaptedsanitary responses, traditional surveillance produce estimates but with 1- to 3-week delay. The main challenge is toproduce near real-time and longer term estimates.
Objective: For the flu, alternative modeling strategies have been proposed to avoid this delay. We assess one of thesemethods to predict AG up to 3 weeks at different levels.MethodsWe used Web data, Hospital data and Historical data in combination with a model Elastic Net and a smoother.
Results: We observe that up to three weeks forecasts, we still obtain PCC between 0.73 and 0.87 and MSE between 0.533and 0.257 depending to the level of prediction.
Conclusions: We found that external data sources in combination with Elastic Net give accurate estimates. It couldcomplement traditional surveillance.
Residents of Long-Term Care Facilities (LTCFs) represent a major share of COVID-19 deaths worldwide. Information on vaccine effectiveness in these settings is essential to improve mitigation strategies, but evidence remains limited. To evaluate the early effect of the administration of BNT162b2 mRNA vaccines in LTCFs, we monitored subsequent SARS-CoV-2 documented infections and deaths in Catalonia, a region of Spain, and compared them to counterfactual model predictions from February 6th to March 28th, 2021, the subsequent time period after which 70% of residents were fully vaccinated. We calculated the reduction in SARS-CoV-2 documented infections and deaths as well as the detected county-level transmission. We estimated that once more than 70% of the LTCFs population were fully vaccinated, 74% (58%-81%, 90% CI) of COVID-19 deaths and 75% (36%-86%) of all documented infections were prevented. Further, detectable transmission was reduced up to 90% (76-93% 90%CI). Our findings provide evidence that high-coverage vaccination is the most effective intervention to prevent SARS-CoV-2 transmission and death. Widespread vaccination could be a feasible avenue to control the COVID-19 pandemic.
After acute infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a subset of individuals experience persistent symptoms involving mood, sleep, anxiety, and fatigue, which may contribute to markedly elevated rates of major depressive disorder observed in recent epidemiologic studies. In this study, we investigated whether acute coronavirus disease 2019 (COVID-19) symptoms are associated with the probability of subsequent depressive symptoms.
Designing public health responses to outbreaks requires close monitoring of population-level health indicators in real-time. Thus an accurate estimation of the epidemic curve is critical. We propose an approach to reconstruct epidemic curves in near real time. We apply this approach to characterize the early SARS-CoV-2 outbreak in two Spanish regions between the months of March and April 2020. We address two data collection problems that affected the reliability of the available real-time epidemiological data, namely, the frequent missing information documenting when a patient first experienced symptoms, and the frequent retrospective revision of historical information (including right censoring). This is done by using a novel back-calculating procedure based on imputing patients dates of symptom onset from reported cases, according to a dynamically-estimated backward reporting delay conditional distribution, and adjusting for right censoring using an existing package, NobBS, to estimate in real time (nowcast) cases by date of symptom onset. This process allows us to obtain an approximation of the time-varying reproduction number (Rt) in real-time. At each step, we evaluate how different assumptions affect the recovered epidemiological events and compare the proposed approach to the alternative procedure of merely using curves of case counts, by report day, to characterize the time-evolution of the outbreak. Finally, we assess how these real-time estimates compare with subsequently documented epidemiological information that is considered more reliable and complete that became available weeks to months later in time. Our approach may help improve accuracy, quantify uncertainty, and evaluate frequently unstated assumptions when recovering the epidemic curves from limited data obtained from public health surveillance systems in other locations.
The current COVID-19 pandemic has impacted cities particularly hard. Here, we provide an in-depth characterization of disease incidence and mortality, and their dependence on demographic and socioeconomic strata in Santiago, a highly segregated city and the capital of Chile. Our analyses show a strong association between socioeconomic status and both COVID-19 outcomes and public health capacity. People living in municipalities with low socioeconomic status did not reduce their mobility during lockdowns as much as those in more affluent municipalities. Testing volumes may have been insufficient early in the pandemic in those places, and both test positivity rates and testing delays were much higher. We find a strong association between socioeconomic status and mortality, measured either by COVID-19 attributed deaths or excess deaths. Finally, we show that infection fatality rates in young people are higher in low-income municipalities. Together, these results highlight the critical consequences of socioeconomic inequalities on health outcomes.
Over 390 million people worldwide are infected with dengue fever each year. In the absence of an effective vaccine for general use, national control programs must rely on hospital readiness and targeted vector control to prepare for epidemics, so accurate forecasting remains an important goal. Many dengue forecasting approaches have used environmental data linked to mosquito ecology to predict when epidemics will occur, but these have had mixed results. Conversely, human mobility, an important driver in the spatial spread of infection, is often ignored. Here we compare time-series forecasts of dengue fever in Thailand, integrating epidemiological data with mobility models generated from mobile phone data. We show that long-distance connectivity is correlated with dengue incidence at forecasting horizons of up to three months, and that incorporating mobility data improves traditional time-series forecasting approaches. Notably, no single model or class of model always outperformed others. We propose an adaptive, mosaic forecasting approach for early warning systems.
Given still-high levels of coronavirus disease 2019 (COVID-19) susceptibility and inconsistent transmission-containing strategies, outbreaks have continued to emerge across the United States. Until effective vaccines are widely deployed, curbing COVID-19 will require carefully timed nonpharmaceutical interventions (NPIs). A COVID-19 early warning system is vital for this. Here, we evaluate digital data streams as early indicators of state-level COVID-19 activity from 1 March to 30 September 2020. We observe that increases in digital data stream activity anticipate increases in confirmed cases and deaths by 2 to 3 weeks. Confirmed cases and deaths also decrease 2 to 4 weeks after NPI implementation, as measured by anonymized, phone-derived human mobility data. We propose a means of harmonizing these data streams to identify future COVID-19 outbreaks. Our results suggest that combining disparate health and behavioral data may help identify disease activity changes weeks before observation using traditional epidemiological monitoring.
The dengue virus affects millions of people every year worldwide, causing large epidemic outbreaks that disrupt people’s lives and severely strain healthcare systems. In the absence of a reliable vaccine against it or an effective treatment to manage the illness in humans, most efforts to combat dengue infections have focused on preventing its vectors, mainly the Aedes aegypti mosquito, from flourishing across the world. These mosquito-control strategies need reliable disease activity surveillance systems to be deployed. Despite significant efforts to estimate dengue incidence using a variety of data sources and methods, little work has been done to understand the relative contribution of the different data sources to improved prediction. Additionally, most work has focused on prediction systems at the national level, rather than at finer spatial resolutions. We develop a methodological framework to assess and compare dengue incidence estimates at the city level and evaluate the performance of a collection of models on 20 different cities in Brazil. The data sources we use towards this end are weekly incidence counts from prior years (seasonal autoregressive terms), weekly-aggregated weather variables, and real-time internet search data. We find that a random forest-based model effectively leverages these multiple data sources and provides robust predictions, while retaining interpretability. For real-time predictions that assume long delays (6-8 weeks) in the availability of epidemiological data, we find that real-time internet search data are the strongest predictors of Dengue incidence, whereas for predictions that assume very short delays (1-2 weeks), short-term and seasonal autocorrelation are dominant as predictors. Despite the difficulties inherent to city-level prediction, our framework achieves meaningful and actionable estimates across cities with different characteristics.
The United States (US) has been among those nations most severely affected by the first—and subsequent—phases of the pandemic of COVID-19 disease caused by SARS-CoV-2. With only 4% of the worldwide population, the US has seen about 22% of COVID-19 deaths. Despite formidable advantages in resources and expertise, presently the per capita mortality rate is over 585/million, respectively 2.4 and 5 times higher compared to Canada and Germany. As we enter Fall 2020, the US is enduring ongoing outbreaks across large regions of the country. Moreover, within the US, an early and persistent feature of the pandemic has been the disproportionate impact on populations already made vulnerable by racism and dangerous jobs, inadequate wages, and unaffordable housing, and this is true for both the headline public health threat and the additional disastrous economic impacts. In this article we assess the impact of missteps by the Federal Government in three specific areas: the introduction of the virus to the US and the establishment of community transmission; the lack of national COVID-19 workplace standards and lack of personal protective equipment (PPE) for workplaces as represented by complaints to the Occupational Safety and Health Administration (OSHA) which we find are correlated with deaths 17 days later (=0.845); and the total excess deaths in 2020 to date, which already total more than 230,000 and exhibit severe inequities in race/ethnicity including among younger age groups.
Objective: To create a machine learning model identifying potentially avoidable blood draws for serum potassium among pediatric patients following cardiac surgery.
Design:Retrospective cohort study. Setting: Tertiary-care center. Patients: All patients admitted to the CICU at Boston Children’s Hospital between January 2010 and December 2018 with a length of stay ≥4 days and ≥2 recorded serum potassium measurements. Interventions None. Measurements and Main Results We collected variables related to potassium homeostasis, including serum chemistry, hourly potassium intake, diuretics, and urine output. Using established machine learning techniques, including Random Forest classifiers and hyperparameters, we created models predicting whether a patient’s potassium would be normal or abnormal based on the most recent potassium level, medications administered, urine output and markers of renal function. We developed multiple models based on different age-categories and temporal proximity of the most recent potassium measurement. We assessed the predictive performance of the models using an independent test set. Of the 7,269 admissions (6,196 patients) included, 95,674 serum potassium was measured on average of 1 (IQR 0-1) time per day. 96% of patients received at least one dose of IV diuretic and 83% received a form of potassium supplementation. Our models predicted a normal potassium value with a median positive predictive value of 0.900. A median percentage of 2.1% measurements (mean 2.5%, IQR 1.3%-3.7%) were incorrectly predicted as normal when they were abnormal. A median percentage of 0.0% (IQR 0.0%-0.4%) were critically low or high measurements were incorrectly predicted as normal. A median of 27.2% (IQR 7.8%- 32.4%) of samples were correctly predicted to be normal and could have been potentially avoided. Conclusions Machine-learning methods can be used to accurately predict avoidable blood tests for serum potassium in critically ill pediatric patients. A median of 27.2% of samples could have been saved, with decreased costs and risk of infection or anemia.
Despite immunisation being one of the greatest medical success stories of the 20th century and its benefits being widely recognized there is a growing lack of confidence in some vaccines. Improving communication about the direct benefits of vaccination as well as its benefits beyond preventing infectious diseases may help regain this lost trust. A conference was organised at the Fondation Merieux in France to discuss what benefits could be communicated and how their communication could use innovative digital initiatives. During this meeting a wide range of poorly known indirect benefits of vaccination, including benefits for chronic non-communicable diseases (NCD). For example, persons with underlying chronic NCDs, such as diabetes and cardiovascular diseases, are particularly vulnerable to complications, hospitalisations, and even death from influenza, although the link between NCDs and influenza is frequently underestimated. Influenza vaccination can reduce hospitalizations and deaths in older persons with diabetes by 45% and 38% respectively. The frequency of antimicrobial resistance (AMR) is increasing worldwide. Vaccination can reduce AMR by reducing the incidence of infectious disease (though direct and indirect or herd protection), by reducing the number of circulating AMR strains, and by reducing the need for antimicrobial use. In addition, as the global population ages, disease morbidity and treatment costs in the elderly population are likely to rise substantially. The promotion of healthy ageing and adopting a life-course approach to health can reduce the burden of vaccine-preventable diseases such as seasonal influenza, pneumococcal diseases, meningitis, pertussis, shingles, measles, diphtheria and tetanus, which place a significant burden on individuals and the ageing society, and improve their quality of life. Novel disease surveillance systems based on information from Internet search-engines, mobile phone apps, social media, new reports, cloud-based electronic-health records, and crowd-sourced systems, contribute to an improved burden of disease awareness. Examples of the role of new techniques and tools to process data generated by multiple sources, such as artificial intelligence, advanced data analytics and biostatistics to support vaccination programmes, such as influenza and dengue were discussed. The conference participants agreed that continual efforts are needed from all stakeholders to ensure effective, transparent communication of the full benefits and risks of vaccines and vaccination and this will require continued dialogue and collaboration.
As the coronavirus disease 2019 (COVID-19) epidemic worsens, understanding the effectiveness of public messaging and large-scale social distancing interventions is critical. The research and public health response communities can and should use population mobility data collected by private companies, with appropriate legal, organizational, and computational safeguards in place. When aggregated, these data can help refine interventions by providing near real-time information about changes in patterns of human movement.
Background: The novel COVID-19 outbreak, caused by the SARS-CoV-2 virus and originally detected in December 2019 in Wuhan, China, has affected more than 140 countries and territories as of March 2020. Given that patients with cancer are generally more vulnerable to infections, systematic analysis of diverse cohorts of patients with cancer affected by COVID-19 are needed.
Methods: Clinical information from 105 hospitalized patients with cancer and 233 hospitalized patients without cancer, all infected by the SARS-CoV-2 virus, were collected from 14 hospitals in Hubei province, China, from January 1, 2020, to February 24, 2020. Standard statistical methodologies were used to compare four different outcomes: death, admission into an intensive care unit (ICU), development of severe/critical symptoms, and utilization of invasive mechanical ventilation; between patients with cancer (of different types, stages, and treatments of cancer) and patients without cancer.
Findings: Compared with COVID-19 patients without cancer, COVID-19 patients with cancer had higher risks in all four severe outcomes. Patients with blood cancers, lung cancers, or with metastatic cancer (stage IV) had the highest frequency of severe events. Non-metastatic cancer (stage I-III) patients experienced similar frequencies of severe conditions to those observed in patients without cancer. Patients who received immunotherapy and surgery had higher risks of having severe events, while patients with only radiotherapy and targeted therapy did not demonstrate significant differences in severe events when compared to patients without cancer.
Interpretations: Patients with blood cancer, lung cancer, and metastatic cancer demonstrated a higher incidence of severe events compared to patients without cancer. In addition, patients who underwent immunotherapy or cancer surgery had higher death rates and higher chances of having critical symptoms.
Background: The COVID-19 outbreak containment strategies in China based on non-pharmaceutical interventions (NPIs) appear to be effective. Quantitative research is still needed however to assess the efficacy of different candidate NPIs and their timings to guide ongoing and future responses to epidemics of this emerging disease across the World. Methods: We built a travel network-based susceptible-exposed-infectious-removed (SEIR) model to simulate the outbreak across cities in mainland China. We used epidemiological parameters estimated for the early stage of outbreak in Wuhan to parameterise the transmission before NPIs were implemented. To quantify the relative effect of various NPIs, daily changes of delay from illness onset to the first reported case in each county were used as a proxy for the improvement of case identification and isolation across the outbreak. Historical and near-real time human movement data, obtained from Baidu location-based service, were used to derive the intensity of travel restrictions and contact reductions across China. The model and outputs were validated using daily reported case numbers, with a series of sensitivity analyses conducted. Results: We estimated that there were a total of 114,325 COVID-19 cases (interquartile range [IQR] 76,776 - 164,576) in mainland China as of February 29, 2020, and these were highly correlated (p<0.001, R2=0.86) with reported incidence. Without NPIs, the number of COVID-19 cases would likely have shown a 67-fold increase (IQR: 44 - 94), with the effectiveness of different interventions varying. The early detection and isolation of cases was estimated to prevent more infections than travel restrictions and contact reductions, but integrated NPIs would achieve the strongest and most rapid effect. If NPIs could have been conducted one week, two weeks, or three weeks earlier in China, cases could have been reduced by 66%, 86%, and 95%, respectively, together with significantly reducing the number of affected areas. However, if NPIs were conducted one week, two weeks, or three weeks later, the number of cases could have shown a 3-fold, 7-fold, and 18-fold increase across China, respectively. Results also suggest that the social distancing intervention should be continued for the next few months in China to prevent case numbers increasing again after travel restrictions were lifted on February 17, 2020. Conclusion: The NPIs deployed in China appear to be effectively containing the COVID-19 outbreak, but the efficacy of the different interventions varied, with the early case detection and contact reduction being the most effective. Moreover, deploying the NPIs early is also important to prevent further spread. Early and integrated NPI strategies should be prepared, adopted and adjusted to minimize health, social and economic impacts in affected regions around the World.
The novel SARS-CoV-2 coronavirus, first identified in Wuhan (Hubei), China, in December 2019, has spread to more than 180 countries and caused over 1,700,000 cases of COVID-19 worldwide to date. In an effort to limit human-to-human contact and slow the transmission of COVID-19, the disease caused by this novel coronavirus, the United States have implemented a collection of shelter-in-place public health interventions. To monitor if these interventions are working and to determine when people may go back to (perhaps a new) business as usual requires reliable monitoring systems that provide an accurate real-time picture of the trajectory of the epidemic outbreak. Here, we present evidence that our current healthcare-based monitoring systems, aimed at detecting the new daily number of COVID-19-positive individuals across the US, may be better at tracking the local testing (detection) capabilities than at monitoring the time evolution of the outbreak. This suggests that other data sources are necessary to inform (real-time) critical decisions about when to stop (and perhaps when to restart) shelter-in-place mitigation strategies.