Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution, and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.
To assess risks and costs of hospital admission associated with short term exposure to fine particulate matter with diameter less than 2.5 µm (PM2.5) for 214 mutually exclusive disease groups.
Time stratified, case crossover analyses with conditional logistic regressions adjusted for non-linear confounding effects of meteorological variables.
Medicare inpatient hospital claims in the United States, 2000-12 (n=95 277 169).
All Medicare fee-for-service beneficiaries aged 65 or older admitted to hospital.
MAIN OUTCOME MEASURES:
Risk of hospital admission, number of admissions, days in hospital, inpatient and post-acute care costs, and value of statistical life (that is, the economic value used to measure the cost of avoiding a death) due to the lives lost at discharge for 214 disease groups.
Positive associations between short term exposure to PM2.5 and risk of hospital admission were found for several prevalent but rarely studied diseases, such as septicemia, fluid and electrolyte disorders, and acute and unspecified renal failure. Positive associations were also found between risk of hospital admission and cardiovascular and respiratory diseases, Parkinson's disease, diabetes, phlebitis, thrombophlebitis, and thromboembolism, confirming previously published results. These associations remained consistent when restricted to days with a daily PM2.5 concentration below the WHO air quality guideline for the 24 hour average exposure to PM2.5. For the rarely studied diseases, each 1 µg/m3 increase in short term PM2.5 was associated with an annual increase of 2050 hospital admissions (95% confidence interval 1914 to 2187 admissions), 12 216 days in hospital (11 358 to 13 075), US$31m (£24m, €28m; $29m to $34m) in inpatient and post-acute care costs, and $2.5bn ($2.0bn to $2.9bn) in value of statistical life. For diseases with a previously known association, each 1 µg/m3 increase in short term exposure to PM2.5 was associated with an annual increase of 3642 hospital admissions (3434 to 3851), 20 098 days in hospital (18 950 to 21 247), $69m ($65m to $73m) in inpatient and post-acute care costs, and $4.1bn ($3.5bn to $4.7bn) in value of statistical life.
New causes and previously identified causes of hospital admission associated with short term exposure to PM2.5 were found. These associations remained even at a daily PM2.5 concentration below the WHO 24 hour guideline. Substantial economic costs were linked to a small increase in short term PM2.5.
The association between PM2.5 and mortality is well established; however, confounding by unmeasured factors is always an issue. In addition, prior studies do not tell us what the effect of a sudden change in exposure on mortality is. We consider the sub-population of Medicare enrollees who moved residence from one ZIP Code to another from 2000 to 2012. Because the choice of new ZIP Code is unlikely to be related with any confounders, restricting to the population of movers allows us to have a study design that incorporates randomization of exposure. Over 10 million Medicare participants moved. We calculated change in exposure by subtracting the annual exposure at original ZIP Code from exposure at the new ZIP Code using a validated model. We used Cox proportional hazards models stratified on original ZIP Code with inverse probability weights (IPW) to control for individual and ecological confounders at the new ZIP Code. The distribution of covariates appeared to be randomized by change in exposure at the new locations as standardized differences were mostly near zero. Randomization of measured covariates suggests unmeasured covariates may be randomized also. Using IPW, per 10 µg/m3 increase in PM2.5, the hazard ratio was 1.21 (95% confidence interval [CI] = 1.20, 1.22] among whites and 1.12 (95% CI = 1.08, 1.15) among blacks. Hazard ratios increased for whites and decreased for blacks when restricting to exposure levels below the current standard of 12 µg/m3. This study provides evidence of likely causal effects at concentrations below current limits of PM2.5.
Propensity score matching is a common tool for adjusting for observed confounding in observational studies, but is known to have limitations in the presence of unmeasured confounding. In many settings, researchers are confronted with spatially-indexed data where the relative locations of the observational units may serve as a useful proxy for unmeasured confounding that varies according to a spatial pattern. We develop a new method, termed distance adjusted propensity score matching (DAPSm) that incorporates information on units’ spatial proximity into a propensity score matching procedure. We show that DAPSm can adjust for both observed and some forms of unobserved confounding and evaluate its performance relative to several other reasonable alternatives for incorporating spatial information into propensity score adjustment. The method is motivated by and applied to a comparative effectiveness investigation of power plant emission reduction technologies designed to reduce population exposure to ambient ozone pollution. Ultimately, DAPSm provides a framework for augmenting a “standard” propensity score analysis with information on spatial proximity and provides a transparent and principled way to assess the relative trade-offs of prioritizing observed confounding adjustment versus spatial proximity adjustment.
Data science is an exploding trans-disciplinary field that aims to harness the power of data to gain information or insights on researcher-defined topics of interest. In this paper, we review how data science can help advance environmental health research.
We discuss the concepts of computationally scalable handling of big data and the design of efficient research data platforms and how data science can provide solutions for methodological challenges in environmental health research, such as high-dimensional outcomes and exposures and prediction models. Finally, we discuss tools for reproducible research.
In this paper, we present opportunities to improve environmental research capabilities by embracing data science and the pitfalls that environmental health researchers should avoid when employing data scientific approaches. Throughout the paper, we emphasize the need for environmental health researchers to collaborate more closely with biostatisticians and data scientists to ensure robust and interpretable results.
Various approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM2.5 at a resolution of 1 km × 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R2 of 0.86 for daily PM2.5 predictions. For annual PM2.5 estimates, the cross-validated R2was 0.89. Our model demonstrated good performance up to 60 μg/m3. Using trained PM2.5 model and predictor variables, we predicted daily PM2.5 from 2000 to 2015 at every 1 km × 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km × 1 km grids to downscale PM2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM2.5 for every 1 km × 1 km grid cell. This PM2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance.
Background: National, state, and local policies contributed to a 65% reduction in sulfur dioxide emissions from coal-fired power plants between 2005 and 2012 in the United States, providing an opportunity to directly quantify public health benefits attributable to these reductions under an air pollution accountability framework.
Methods: We estimate ZIP code-level changes in two different—but related—exposure metrics: total PM2.5 concentrations and exposure to coal-fired power plant emissions. We associate changes in 10 health outcome rates among approximately 30 million US Medicare beneficiaries with exposure changes between 2005 and 2012 using two difference-in-difference regression approaches designed to mitigate observed and unobserved confounding.
Results: Rates per 10,000 person–years of six cardiac and respiratory health outcomes—all cardiovascular disease, chronic obstructive pulmonary disorder, cardiovascular stroke, heart failure, ischemic heart disease, and respiratory tract infections—decreased by between 7.89 and 1.95 per
decrease in PM2.5, with comparable decreases in coal exposure leading to slightly larger rate decreases. Results for acute myocardial infarction, heart rhythm disorders, and peripheral vascular disease were near zero and/or mixed between the various exposure metrics and analyses. A secondary analysis found that nonlinearities in relationships between changing health outcome rates and coal exposure may explain differences in their associations.
Conclusions: The direct analyses of emissions reductions estimate substantial health benefits via coal power plant emission and PM2.5 concentration reductions. Differing responses associated with changes in the two exposure metrics underscore the importance of isolating source-specific impacts from those due to total PM2.5 exposure.
This article introduces the special issue on “Corporate Reputation: Being Good and Looking Good.” Three of the five included articles help to reinforce a conclusion that “being good” and “looking good” are not dichotomous, mutually exclusive conditions. Rather, the two dimensions are linked in some kind of causal relationship for which continuing conceptual and empirical research is desirable. A fourth article concerns the reputational effects of the stock-option backdating scandal. The fifth article offers a critique of conventional approaches to defining and measuring reputation.
Hospitals that serve poorer populations have higher readmission rates, perhaps due to social factors beyond their control. It is unknown whether such hospitals – particularly those with high baseline penalties – effectively lowered readmission rates for congestive heart failure in response to the Hospital Readmissions Reduction Program (HRRP).
Using the patients admitted with CHF, in the national Medicare Provider and Analysis Review (MedPAR) files from January 1, 2003 to November 30, 2014, we used a piecewise linear model with estimated hospital-level quarterly RSRRs as the dependent variable and a change point at HRRP passage (2010) to test 2 main hypotheses. First, for hospitals of low, average, and high economic burden as assessed by proportion of dual-eligibles served, we tested if RSRRs declined in the post-law period after controlling for the pre-law secular trend. Second, we tested if these pre-post differences were different among economic burden groups. To explore specific effects within the most highly penalized (ie, low performing) hospitals, we repeated the main analysis only among the 742 hospitals that received the highest penalties in the first year of the HRRP.
For all economic burden groups, CHF readmission rates declined after the law relative to pre-law trends (p < 0.001 for all pre-post comparisons). RSRRs declined more at high economic burden hospitals than low economic burden hospitals (-79 vs. -75 risk-standardized readmissions per 10000 discharges per year, p = 0.0006). Among the 742 highest penalized hospitals and all conditions, the pre-post decline in rate of change of RSRRs was less for high economic burden hospitals than low economic burden hospitals (-88 vs. -97 for CHF, p < 0.01).
After the HRRP, RSRRs for congestive heart failure declined more at high economic burden hospitals than low economic burden hospitals. However, among high-penalty (low-performance) hospitals, RSRRs declined more for low economic burden hospitals compared with high economic burden hospitals. These results suggest that a specific group of high penalty, high economic burden hospitals may be less able to improve performance on readmission metrics.
In anticipation of the expanding appreciation for air quality models in health outcomes studies, we develop and evaluate a reduced-complexity model for pollution transport that intentionally sacrifices some of the sophistication of full-scale chemical transport models in order to support applicability to a wider range of health studies. Specifically, we introduce the HYSPLIT average dispersion model, HyADS, which combines the HYSPLIT trajectory dispersion model with modern advances in parallel computing to estimate ZIP code level exposure to emissions from individual coal-powered electricity generating units in the United States. Importantly, the method is not designed to reproduce ambient concentrations of any particular air pollutant; rather, the primary goal is to characterize each ZIP code's exposure to these coal power plants specifically. We show adequate performance towards this goal against observed annual average air pollutant concentrations (nationwide Pearson correlations of 0.88 and 0.73 with SO42− and PM2.5, respectively) and coal-combustion impacts simulated with a full-scale chemical transport model and adjusted to observations using a hybrid direct sensitivities approach (correlation of 0.90). We proceed to provide multiple examples of HyADS's single-source applicability, including to show that 22% of the population-weighted coal exposure comes from 30 coal-powered electricity generating units.
We propose a new approach for estimating causal effects when the exposure is measured with error and confounding adjustment is performed via a generalized propensity score (GPS). Using validation data, we propose a regression calibration (RC)-based adjustment for a continuous error-prone exposure combined with GPS to adjust for confounding (RC-GPS). The outcome analysis is conducted after transforming the corrected continuous exposure into a categorical exposure. We consider confounding adjustment in the context of GPS subclassification, inverse probability treatment weighting (IPTW) and matching. In simulations with varying degrees of exposure error and confounding bias, RC-GPS eliminates bias from exposure error and confounding compared to standard approaches that rely on the error-prone exposure. We applied RC-GPS to a rich data platform to estimate the causal effect of long-term exposure to fine particles (PM2.5PM2.5) on mortality in New England for the period from 2000 to 2012. The main study consists of 22022202 zip codes covered by 217,660217,660 1 km×1 km1 km×1 kmgrid cells with yearly mortality rates, yearly PM2.5PM2.5 averages estimated from a spatio-temporal model (error-prone exposure) and several potential confounders. The internal validation study includes a subset of 83 1 km×1 km1 km×1 kmgrid cells within 75 zip codes from the main study with error-free yearly PM2.5PM2.5 exposures obtained from monitor stations. Under assumptions of noninterference and weak unconfoundedness, using matching we found that exposure to moderate levels of PM2.5PM2.5 (8<PM2.5≤10 μg/m38<PM2.5≤10 μg/m3) causes a 2.8% (95% CI: 0.6%, 3.6%) increase in all-cause mortality compared to low exposure (PM2.5≤8 μg/m3PM2.5≤8 μg/m3).
Background: Despite dramatic air quality improvement in the United States over the past decades, recent years have brought renewed scrutiny and uncertainty surrounding the effectiveness of specific regulatory programs for continuing to improve air quality and public health outcomes. Methods: We employ causal inference methods and a spatial hierarchical regression model to characterize the extent to which a designation of ?nonattainment? with the 1997 National Ambient Air Quality Standard for ambient fine particulate matter (PM2.5) in 2005 causally affected ambient PM2.5 and health outcomes among over 10 million Medicare beneficiaries in the Eastern United States in 2009?2012. Results: We found that, on average across all retained study locations, reductions in ambient PM2.5 and Medicare health outcomes could not be conclusively attributed to the nonattainment designations against the backdrop of other regional strategies that impacted the entire Eastern United States. A more targeted principal stratification analysis indicates substantial health impacts of the nonattainment designations among the subset of areas where the designations are estimated to have actually reduced ambient PM2.5 beyond levels achieved by regional measures, with noteworthy reductions in all-cause mortality, chronic obstructive pulmonary disorder, heart failure, ischemic heart disease, and respiratory tract infections. Discussion: These findings provide targeted evidence of the effectiveness of local control measures after nonattainment designations for the 1997 PM2.5 air quality standard.
Importance The US Environmental Protection Agency is required to reexamine its National Ambient Air Quality Standards (NAAQS) every 5 years, but evidence of mortality risk is lacking at air pollution levels below the current daily NAAQS in unmonitored areas and for sensitive subgroups.
Objective To estimate the association between short-term exposures to ambient fine particulate matter (PM2.5) and ozone, and at levels below the current daily NAAQS, and mortality in the continental United States.
Design, Setting, and Participants Case-crossover design and conditional logistic regression to estimate the association between short-term exposures to PM2.5 and ozone (mean of daily exposure on the same day of death and 1 day prior) and mortality in 2-pollutant models. The study included the entire Medicare population from January 1, 2000, to December 31, 2012, residing in 39 182 zip codes.
Exposures Daily PM2.5 and ozone levels in a 1-km × 1-km grid were estimated using published and validated air pollution prediction models based on land use, chemical transport modeling, and satellite remote sensing data. From these gridded exposures, daily exposures were calculated for every zip code in the United States. Warm-season ozone was defined as ozone levels for the months April to September of each year.
Main Outcomes and Measures All-cause mortality in the entire Medicare population from 2000 to 2012.
Results During the study period, there were 22 433 862 million case days and 76 143 209 control days. Of all case and control days, 93.6% had PM2.5 levels below 25 μg/m3, during which 95.2% of deaths occurred (21 353 817 of 22 433 862), and 91.1% of days had ozone levels below 60 parts per billion, during which 93.4% of deaths occurred (20 955 387 of 22 433 862). The baseline daily mortality rates were 137.33 and 129.44 (per 1 million persons at risk per day) for the entire year and for the warm season, respectively. Each short-term increase of 10 μg/m3 in PM2.5 (adjusted by ozone) and 10 parts per billion (10−9) in warm-season ozone (adjusted by PM2.5) were statistically significantly associated with a relative increase of 1.05% (95% CI, 0.95%-1.15%) and 0.51% (95% CI, 0.41%-0.61%) in daily mortality rate, respectively. Absolute risk differences in daily mortality rate were 1.42 (95% CI, 1.29-1.56) and 0.66 (95% CI, 0.53-0.78) per 1 million persons at risk per day. There was no evidence of a threshold in the exposure-response relationship.
Conclusions and Relevance In the US Medicare population from 2000 to 2012, short-term exposures to PM2.5and warm-season ozone were significantly associated with increased risk of mortality. This risk occurred at levels below current national air quality standards, suggesting that these standards may need to be reevaluated.
Studies have shown that long-term exposure to air pollution increases mortality. However, evidence is limited for air-pollution levels below the most recent National Ambient Air Quality Standards. Previous studies involved predominantly urban populations and did not have the statistical power to estimate the health effects in underrepresented groups.
We constructed an open cohort of all Medicare beneficiaries (60,925,443 persons) in the continental United States from the years 2000 through 2012, with 460,310,521 person-years of follow-up. Annual averages of fine particulate matter (particles with a mass median aerodynamic diameter of less than 2.5 μm [PM2.5]) and ozone were estimated according to the ZIP Code of residence for each enrollee with the use of previously validated prediction models. We estimated the risk of death associated with exposure to increases of 10 μg per cubic meter for PM2.5 and 10 parts per billion (ppb) for ozone using a two-pollutant Cox proportional-hazards model that controlled for demographic characteristics, Medicaid eligibility, and area-level covariates.
Increases of 10 μg per cubic meter in PM2.5 and of 10 ppb in ozone were associated with increases in all-cause mortality of 7.3% (95% confidence interval [CI], 7.1 to 7.5) and 1.1% (95% CI, 1.0 to 1.2), respectively. When the analysis was restricted to person-years with exposure to PM2.5 of less than 12 μg per cubic meter and ozone of less than 50 ppb, the same increases in PM2.5 and ozone were associated with increases in the risk of death of 13.6% (95% CI, 13.1 to 14.1) and 1.0% (95% CI, 0.9 to 1.1), respectively. For PM2.5, the risk of death among men, blacks, and people with Medicaid eligibility was higher than that in the rest of the population.
In the entire Medicare population, there was significant evidence of adverse effects related to exposure to PM2.5 and ozone at concentrations below current national standards. This effect was most pronounced among self-identified racial minorities and people with low income. (Supported by the Health Effects Institute and others.)
Whether hospitals with the highest risk-standardized readmission rates (RSRRs) subsequently experienced the greatest improvement after passage of the Medicare Hospital Readmissions Reduction Program (HRRP) is unknown.
To evaluate whether passage of the HRRP was followed by acceleration in improvement in 30-day RSRRs after hospitalizations for acute myocardial infarction (AMI), congestive heart failure (CHF), or pneumonia and whether the lowest-performing hospitals had faster acceleration in improvement after passage of the law than hospitals that were already performing well.
Pre-post analysis stratified by hospital performance groups.
U.S. acute care hospitals.
15 170 008 Medicare patients discharged alive from 2000 to 2013.
Passage of the HRRP.
30-day readmission rates after hospitalization for AMI, CHF, or pneumonia for hospitals in the highest-performance (0% penalty), average-performance (>0% and <0.50% penalty), low-performance (≥0.50% and <0.99% penalty), and lowest-performance (≥0.99% penalty) groups.
Of 2868 hospitals serving 1 109 530 Medicare discharges annually, 30.1% were highest performers, 44.0% were average performers, 16.8% were low performers, and 9.0% were lowest performers. After controlling for prelaw trends, an additional 67.6 (95% CI, 66.6 to 68.4), 74.8 (CI, 74.0 to 75.4), 85.4 (CI, 84.0 to 86.8), and 95.1 (CI, 92.6 to 97.5) readmissions per 10 000 discharges were found to have been averted per year in the highest-, average-, low-, and lowest-performance groups, respectively, after passage of the law.
Inability to distinguish between improvement caused by the magnitude of the penalty or by different levels of health improvement in different patient populations.
After passage of the HRRP, 30-day RSRRs for myocardial infarction, heart failure, and pneumonia decreased more rapidly than before the law's passage. Improvement was most marked for hospitals with the lowest prelaw performance.
In this paper, we compare the error in several approximation methods for the cumulative aggregate claim distribution customarily used in the collective model of insurance theory. In this model, it is usually supposed that a portfolio is at risk for a time period of length t. The occurrences of the claims are governed by a Poisson process of intensity μ so that the number of claims in [0,t] is a Poisson random variable with parameter λ = μ t. Each single claim is an independent replication of the random variable X, representing the claim severity. The aggregate claim or total claim amount process in [0,t] is represented by the random sum of N independent replications of X, whose cumulative distribution function (cdf) is the object of study. Due to its computational complexity, several approximation methods for this cdf have been proposed. In this paper, we consider 15 approximations put forward in the literature that only use information on the lower order moments of the involved distributions. For each approximation, we consider the difference between the true distribution and the approximating one and we propose to use expansions of this difference related to Edgeworth series to measure their accuracy as λ = μ t diverges to infinity. Using these expansions, several statements concerning the quality of approximations for the distribution of the aggregate claim process can find theoretical support. Other statements can be disproved on the same grounds. Finally, we investigate numerically the accuracy of the proposed formulas.
We examine the concept of essential intersection of a random set in the framework of robust optimization programs and ergodic theory. Using a recent extension of Birkhoff’s Ergodic Theorem developed by the present authors, it is shown that essential intersection can be represented as the countable intersection of random sets involving an asymptotically mean stationary transformation. This is applied to the approximation of a robust optimization program by a sequence of simpler programs with only a finite number of constraints. We also discuss some formulations of robust optimization programs that have appeared in the literature and we make them more precise, especially from the probabilistic point of view. We show that the essential intersection appears naturally in the correct formulation.
The objective is to develop a reliable method to build confidence sets for the Aumann mean of a random closed set as estimated through the Minkowski empirical mean. First, a general definition of the confidence set for the mean of a random set is provided. Then, a method using a characterization of the confidence set through the support function is proposed and a bootstrap algorithm is described, whose performance is investigated in Monte Carlo simulations.
Abstract We study various methods of aggregating individual judgments and individual priorities in group decision making with the AHP. The focus is on the empirical properties of the various methods, mainly on the extent to which the various aggregation methods represent an accurate approximation of the priority vector of interest. We identify five main classes of aggregation procedures which provide identical or very similar empirical expressions for the vectors of interest. We also propose a method to decompose in the AHP response matrix distortions due to random errors and perturbations caused by cognitive biases predicted by the mathematical psychology literature. We test the decomposition with experimental data and find that perturbations in group decision making caused by cognitive distortions are more important than those caused by random errors. We propose methods to correct the systematic distortions.