In clinical trials multiple outcomes are often used to assess treatment interventions. This paper presents an evaluation of likelihood-based methods for jointly testing treatment effects in clinical trials with multiple continuous outcomes. Specifically, we compare the power of joint tests of treatment effects obtained from joint models for the multiple outcomes with univariate tests based on modeling the outcomes separately. We also consider the power and bias of tests when data are missing, a common feature of many trials, especially in psychiatry. Our results suggest that joint tests capitalize on the correlation of multiple outcomes and are more powerful than standard univariate methods, especially when outcomes are missing completely at random. When outcomes are missing at random, test procedures based on correctly specified joint models are unbiased, while standard univariate procedures are not. Results of a simulation study are reported, and the methods are illustrated in an example from the Clinical Antipsychotic Trials of Intervention Effectiveness for schizophrenia.
In 2001, the U.S. Office of Personnel Management required all health plans participating in the Federal Employees Health Benefits Program to offer mental health and substance abuse benefits on par with general medical benefits. The initial evaluation found that, on average, parity did not result in either large spending increases or increased service use over the four-year observational period. However, some groups of enrollees may have benefited from parity more than others. To address this question, we propose a Bayesian two-part latent class model to characterize the effect of parity on mental health use and expenditures. Within each class, we fit a two-part random effects model to separately model the probability of mental health or substance abuse use and mean spending trajectories among those having used services. The regression coefficients and random effect covariances vary across classes, thus permitting class-varying correlation structures between the two components of the model. Our analysis identified three classes of subjects: a group of low spenders that tended to be male, had relatively rare use of services, and decreased their spending pattern over time; a group of moderate spenders, primarily female, that had an increase in both use and mean spending after the introduction of parity; and a group of high spenders that tended to have chronic service use and constant spending patterns. By examining the joint 95% highest probability density regions of expected changes in use and spending for each class, we confirmed that parity had an impact only on the moderate spender class.
OBJECTIVE: To determine comparative safety and effectiveness of combinations of bearing surfaces of hip implants.
DESIGN: Systematic review of clinical trials, observational studies, and registries.
DATA SOURCES: Medline, Embase, Cochrane Controlled Trials Register, reference lists of articles, annual reports of major registries, summaries of safety and effectiveness for pre-market application and mandated post-market studies at the United States Food and Drug Administration.
STUDY SELECTION: Criteria for inclusion were comparative studies in adults reporting information for various combinations of bearings (such as metal on metal and ceramic on ceramic). Data search, abstraction, and analyses were independently performed and confirmed by at least two authors. Qualitative data syntheses were performed.
RESULTS: There were 3139 patients and 3404 hips enrolled in 18 comparative studies and over 830 000 operations in national registries. The mean age range in the trials was 42-71, and 26-88% were women. Disease specific functional outcomes and general quality of life scores were no different or they favoured patients receiving metal on polyethylene rather than metal on metal in the trials. While one clinical study reported fewer dislocations associated with metal on metal implants, in the three largest national registries there was evidence of higher rates of implant revision associated with metal on metal implants compared with metal on polyethylene. One trial reported fewer revisions with ceramic on ceramic compared with metal on polyethylene implants, but data from national registries did not support this finding.
CONCLUSIONS: There is limited evidence regarding comparative effectiveness of various hip implant bearings. Results do not indicate any advantage for metal on metal or ceramic on ceramic implants compared with traditional metal on polyethylene or ceramic on polyethylene bearings.
OBJECTIVES: In disparities models, researchers adjust for differences in "clinical need," including indicators of comorbidities. We reconsider this practice, assessing (1) if and how having a comorbidity changes the likelihood of recognition and treatment of mental illness; and (2) differences in mental health care disparities estimates with and without adjustment for comorbidities.
DATA: Longitudinal data from 2000 to 2007 Medical Expenditure Panel Survey (n=11,083) split into pre and postperiods for white, Latino, and black adults with probable need for mental health care.
STUDY DESIGN: First, we tested a crowd-out effect (comorbidities decrease initiation of mental health care after a primary care provider [PCP] visit) using logistic regression models and an exposure effect (comorbidities cause more PCP visits, increasing initiation of mental health care) using instrumental variable methods. Second, we assessed the impact of adjustment for comorbidities on disparity estimates.
PRINCIPAL FINDINGS: We found no evidence of a crowd-out effect but strong evidence for an exposure effect. Number of postperiod visits positively predicted initiation of mental health care. Adjusting for racial/ethnic differences in comorbidities increased black-white disparities and decreased Latino-white disparities.
CONCLUSIONS: Positive exposure findings suggest that intensive follow-up programs shown to reduce disparities in chronic-care management may have additional indirect effects on reducing mental health care disparities.
BACKGROUND: Readmission following hospital discharge has become an important target of quality improvement.
OBJECTIVE: To describe the development, validation, and results of a risk-standardized measure of hospital readmission rates among elderly patients with pneumonia employed in federal quality measurement and efficiency initiatives.
DESIGN: A retrospective cohort study using hospital and outpatient Medicare claims from 2005 and 2006.
SETTING: A total of 4675 hospitals in the United States.
PATIENTS: Medicare beneficiaries aged >65 years with a principal discharge diagnosis of pneumonia.
MEASUREMENTS: Hospital-specific, risk-standardized 30-day readmission rates calculated as the ratio of predicted-to-expected readmissions, multiplied by the national unadjusted rate. Comparison of the areas under the receiver operating curve (ROC) and measurement of correlation coefficient in development and validation samples.
RESULTS: The development sample consisted of 226,545 hospitalizations at 4675 hospitals, with an overall unadjusted 30-day readmission rate of 17.4%. The median risk-standardized hospital readmission rate was 17.3%, and the odds of readmission for a hospital one standard deviation above average was 1.4 times that of a hospital one standard deviation below average. Performance of the medical record and administrative models was similar (areas under the ROC curve 0.59 and 0.63, respectively) and the correlation coefficient of estimated state-specific standardized readmission rates from the administrative and medical record models was 0.96.
CONCLUSIONS: Rehospitalization within 30 days of treatment for pneumonia is common, and rates vary across hospitals. A risk-standardized measure of hospital readmission rates derived from administrative claims has similar performance characteristics to one based on medical record review.
OBJECTIVE: This study examined the impact of insurance parity on the use, cost, and quality of substance abuse treatment.
METHODS: The authors compared substance abuse treatment spending and utilization from 1999 to 2002 for continuously enrolled beneficiaries covered by Federal Employees Health Benefit (FEHB) plans, which require parity coverage of mental health and substance use disorders, with spending and utilization among beneficiaries in a matched set of health plans without parity coverage. Logistic regression models estimated the probability of any substance abuse service use. Conditional on use, linear models estimated total and out-of-pocket spending. Logistic regression models for three quality indicators for substance abuse treatment were also estimated: identification of adult enrollees with a new substance abuse diagnosis, treatment initiation, and treatment engagement. Difference-in-difference estimates were computed as (postparity - preparity) differences in outcomes in plans without parity subtracted from those in FEHB plans.
RESULTS: There were no significant differences between FEHB and non-FEHB plans in rates of change in average utilization of substance abuse services. Conditional on service utilization, the rate of substance abuse treatment out-of-pocket spending declined significantly in the FEHB plans compared with the non-FEHB plans (mean difference=-$101.09, 95% confidence interval [CI]=-$198.06 to -$4.12), whereas changes in total plan spending per user did not differ significantly. With parity, more patients had new diagnoses of a substance use disorder (difference-in-difference risk=.10%, CI=.02% to .19%). No statistically significant differences were found for rates of initiation and engagement in substance abuse treatment.
CONCLUSIONS: Findings suggest that for continuously enrolled populations, providing parity of substance abuse treatment coverage improved insurance protection but had little impact on utilization, costs for plans, or quality of care.
Estimation of the effect of one treatment compared to another in the absence of randomization is a common problem in biostatistics. An increasingly popular approach involves instrumental variables-variables that are predictive of who received a treatment yet not directly predictive of the outcome. When treatment is binary, many estimators have been proposed: method-of-moments estimators using a two-stage least-squares procedure, generalized-method-of-moments estimators using two-stage predictor substitution or two-stage residual inclusion procedures, and likelihood-based latent variable approaches. The critical assumptions to the consistency of two-stage procedures and of the likelihood-based procedures differ. Because neither set of assumptions can be completely tested from the observed data alone, comparing the results from the different approaches is an important sensitivity analysis. We provide a general statistical framework for estimation of the casual effect of a binary treatment on a continuous outcome using simultaneous equations to specify models. A comparison of health care costs for adults with schizophrenia treated with newer atypical antipsychotics and those treated with conventional antipsychotic medications illustrates our methods. Surprisingly large differences in the results among the methods are investigated using a simulation study. Several new findings concerning the performance in terms of precision and robustness of each approach in different situations are obtained. We illustrate that in general supplemental information is needed to determine which analysis, if any, is trustworthy and reaffirm that comparing results from different approaches is a valuable sensitivity analysis.
BACKGROUND: Automated adverse outcome surveillance tools and methods have potential utility in quality improvement and medical product surveillance activities. Their use for assessing hospital performance on the basis of patient outcomes has received little attention. We compared risk-adjusted sequential probability ratio testing (RA-SPRT) implemented in an automated tool to Massachusetts public reports of 30-day mortality after isolated coronary artery bypass graft surgery.
METHODS: A total of 23,020 isolated adult coronary artery bypass surgery admissions performed in Massachusetts hospitals between January 1, 2002 and September 30, 2007 were retrospectively re-evaluated. The RA-SPRT method was implemented within an automated surveillance tool to identify hospital outliers in yearly increments. We used an overall type I error rate of 0.05, an overall type II error rate of 0.10, and a threshold that signaled if the odds of dying 30-days after surgery was at least twice than expected. Annual hospital outlier status, based on the state-reported classification, was considered the gold standard. An event was defined as at least one occurrence of a higher-than-expected hospital mortality rate during a given year.
RESULTS: We examined a total of 83 hospital-year observations. The RA-SPRT method alerted 6 events among three hospitals for 30-day mortality compared with 5 events among two hospitals using the state public reports, yielding a sensitivity of 100% (5/5) and specificity of 98.8% (79/80).
CONCLUSIONS: The automated RA-SPRT method performed well, detecting all of the true institutional outliers with a small false positive alerting rate. Such a system could provide confidential automated notification to local institutions in advance of public reporting providing opportunities for earlier quality improvement interventions.
PURPOSE: The objectives of this study were (1) to examine whether the association between childhood family conflict and the risk of substance use disorders (SUDs) in adolescence differs by gender, and (2) to determine whether anxious/depressive symptoms and conduct problems explain this association among adolescent males and females.
METHODS: Data were obtained from 1,421 children aged 10-16 years at the time of enrollment in the Project on Human Development in Chicago Neighborhoods. We assessed gender differences in the association between childhood family conflict and adolescent SUDs by fitting a logistic regression model that included the interaction of gender and family conflict. We also investigated whether conduct problems and anxious/depressive symptoms explained the association between family conflict and SUDs differently for males and females through gender-specific mediation analyses.
RESULTS: The association between childhood family conflict and SUDs in adolescence differed by gender (p = .04). Family conflict was significantly associated with SUDs among females (OR: 1.61; CI: 1.20-2.15), but not among males (OR: 1.00; CI: .76-1.32). The elevated risk of SUDs among females exposed to family conflict was partly explained by girls' conduct problems, but not by anxious/depressive symptoms.
CONCLUSIONS: Females living in families with elevated levels of conflict were more likely to engage in acting out behaviors, which was associated with the development of SUDs. Future epidemiologic research is needed to help determine when this exposure is most problematic with respect to subsequent mental health outcomes and the most crucial time to intervene.
BACKGROUND: As part of state-mandated public reporting of outcomes after percutaneous coronary interventions (PCIs) in Massachusetts, procedural and clinical data were prospectively collected. Variables associated with higher mortality were audited to ensure accuracy of coding. We examined the impact of adjudication on identifying hospitals with possible deficiencies in the quality of PCI care.
METHODS AND RESULTS: From October 2005 to September 2006, 15 721 admissions for PCI occurred in 21 hospitals. Of the 864 high-risk variables from 822 patients audited by committee, 201 were changed, with reassignment to lower acuities in 97 (30%) of the 321 shock cases, 24 (43%) of the 56 salvage cases, and 73 (15%) of the 478 emergent cases. Logistic regression models were used to predict patient-specific in-hospital mortality. Of 241 (1.5%) patients who died after PCI, 30 (12.4%) had a lower predicted mortality with adjudicated than with unadjudicated data. Model accuracy was excellent with either adjudicated or unadjudicated data. Hospital-specific risk-standardized mortality rates were estimated using both adjudicated and unadjudicated data through hierarchical logistic regression. Although adjudication reduced between-hospital variation by one third, risk-standardized mortality rates were similar using unadjudicated and adjudicated data. None of the hospitals were identified as statistical outliers. However, cross-validated posterior-predicted P values calculated with adjudicated data increased the number of borderline hospital outliers compared with unadjudicated data.
CONCLUSIONS: Independent adjudication of site-reported high-risk features may increase the ability to identify hospitals with higher risk-adjusted mortality after PCI despite having little impact on the accuracy of risk prediction for the entire population.
OBJECTIVES: This study investigated the impact of adding novel elements to models predicting in-hospital mortality after percutaneous coronary interventions (PCIs).
BACKGROUND: Massachusetts mandated public reporting of hospital-specific PCI mortality in 2003. In 2006, a physician advisory group recommended adding to the prediction models 3 attributes not collected by the National Cardiovascular Data Registry instrument. These "compassionate use" (CU) features included coma on presentation, active hemodynamic support during PCI, and cardiopulmonary resuscitation at PCI initiation.
METHODS: From October 2005 through September 2007, PCI was performed during 29,784 admissions in Massachusetts nonfederal hospitals. Of these, 5,588 involved patients with ST-segment elevation myocardial infarction or cardiogenic shock. Cases with CU criteria identified were adjudicated by trained physician reviewers. Regression models with and without the CU composite variable (presence of any of the 3 features) were compared using areas under the receiver-operator characteristic curves.
RESULTS: Unadjusted mortality in this high-risk subset was 5.7%. Among these admissions, 96 (1.7%) had at least 1 CU feature, with 69.8% mortality. The adjusted odds ratio for in-hospital death for CU PCIs (vs. no CU criteria) was 27.3 (95% confidence interval: 14.5 to 47.6). Discrimination of the model improved after including CU, with areas under the receiver-operating characteristic curves increasing from 0.87 to 0.90 (p < 0.01), while goodness of fit was preserved.
CONCLUSIONS: A small proportion of patients at extreme risk of post-PCI mortality can be identified using pre-procedural factors not routinely collected, but that heighten predictive accuracy. Such improvements in model performance may result in greater confidence in reporting of risk-adjusted PCI outcomes.
BACKGROUND: Patients with chronic kidney disease have been under-represented in randomized trials of drug-eluting stents relative to bare-metal stents and are at high risk of mortality.
STUDY DESIGN: Cohort study with propensity score matching.
SETTINGS & PARTICIPANTS: All adults with chronic kidney disease and severely decreased glomerular filtration rate (GFR; serum creatinine >2.0 mg/dL or dialysis dependence) undergoing percutaneous coronary intervention with stent placement between April 1, 2003, and September 30, 2005, at all acute-care nonfederal hospitals in Massachusetts.
PREDICTOR: Patients were classified as drug-eluting stent-treated if all stents were drug eluting and bare-metal stent-treated if all stents were bare metal. Patients treated with both types of stents were excluded from the primary analysis.
OUTCOMES & MEASUREMENTS: 2-year crude mortality risk differences (drug-eluting - bare-metal stents) were determined from vital statistics records, and risk-adjusted mortality, myocardial infraction (MI), and revascularization differences were estimated using propensity score matching of patients with severely reduced GFR based on clinical and procedural information collected at the index admission.
RESULTS: 1,749 patients with severely reduced GFR (24% dialysis dependent) were treated with drug-eluting (n = 1,256) or bare-metal stents (n = 493) during the study. Overall 2-year mortality was 32.8% (unadjusted drug-eluting stent vs bare-metal stent; 30.1% vs 39.8%; P < 0.001). After propensity score matching 431 patients with a drug-eluting stent to 431 patients with a bare-metal stent, 2-year risk-adjusted mortality, MI, and target-vessel revascularization rates were 39.4% versus 37.4% (risk difference, 2.1%; 95% CI, -4.3 to 8.5; P = 0.5), 16.0% versus 19.0% (risk difference, -3.0%; 95% CI, -8.2 to 2.1; P = 0.3), and 13.0% versus 17.6% (risk difference, -4.6%; 95% CI, -9.5 to 0.3; P = 0.06).
LIMITATIONS: Observational design, ascertainment of serum creatinine level >2.0 mg/dL and dialysis dependence from case report forms.
CONCLUSIONS: In patients with severely decreased GFR, treatment with drug-eluting stents was associated with a modest decrease in target-vessel revascularization not reaching statistical significance and was not associated with a difference in risk-adjusted rates of mortality or MI at 2 years compared with bare-metal stents.
Biomedical research often involves the measurement of multiple outcomes in different scales (continuous, binary and ordinal). A common approach for the analysis of such data is to ignore the potential correlation among the outcomes and model each outcome separately. This can lead not only to loss of efficiency but also to biased estimates in the presence of missing data. We address the problem of missing data in the context of multiple non-commensurate outcomes. The consequences of missing data when using likelihood and quasi-likelihood methods are described, and an extension of these methods to the situation of missing observations in the outcomes is proposed. Two real data examples illustrate the methodology.
CONTEXT: It is not known whether recent declines in ischemic heart disease and its risk factors have been accompanied by declines in heart failure (HF) hospitalization and mortality.
OBJECTIVE: To examine changes in HF hospitalization rate and 1-year mortality rate in the United States, nationally and by state or territory.
DESIGN, SETTING, AND PARTICIPANTS: From acute care hospitals in the United States and Puerto Rico, 55,097,390 fee-for-service Medicare beneficiaries hospitalized between 1998 and 2008 with a principal discharge diagnosis code for HF.
MAIN OUTCOME MEASURES: Changes in patient demographics and comorbidities, HF hospitalization rates, and 1-year mortality rates.
RESULTS: The HF hospitalization rate adjusted for age, sex, and race declined from 2845 per 100,000 person-years in 1998 to 2007 per 100,000 person-years in 2008 (P < .001), a relative decline of 29.5%. Age-adjusted HF hospitalization rates declined over the study period for all race-sex categories. Black men had the lowest rate of decline (4142 to 3201 per 100,000 person-years) among all race-sex categories, which persisted after adjusting for age (incidence rate ratio, 0.81; 95% CI, 0.79-0.84). Heart failure hospitalization rates declined significantly faster than the national mean in 16 states and significantly slower in 3 states. Risk-adjusted 1-year mortality decreased from 31.7% in 1999 to 29.6% in 2008 (P < .001), a relative decline of 6.6%. One-year mortality rates declined significantly in 4 states but increased in 5 states.
CONCLUSIONS: The overall HF hospitalization rate declined substantially from 1998 to 2008 but at a lower rate for black men. The overall 1-year mortality rate declined slightly over the past decade but remains high. Changes in HF hospitalization and 1-year mortality rates were uneven across states.
BACKGROUND: Drug-eluting stents (DES) for percutaneous coronary intervention decrease the risk of restenosis compared with bare metal stents. However, they are costlier, require prolonged dual antiplatelet therapy, and provide the most benefit in patients at highest risk for restenosis. To assist physicians in targeting DES use in patients at the highest risk for target vessel revascularization (TVR), we developed and validated a model to predict TVR.
METHODS AND RESULTS: Preprocedural clinical and angiographic data from 27 107 percutaneous coronary intervention hospitalizations between October 1, 2004, and September 30, 2007, in Massachusetts were used to develop prediction models for TVR at 1 year. Models were developed from a two-thirds random sample and validated in the remaining third. The overall rate of TVR was 7.6% (6.7% with DES, 11% with bare metal stents). Significant predictors of TVR included prior percutaneous coronary intervention, emergency or salvage percutaneous coronary intervention, prior coronary bypass surgery, peripheral vascular disease, diabetes mellitus, and angiographic characteristics. The model was superior to a 3-variable model of diabetes mellitus, stent diameter, and stent length (c statistic, 0.66 versus 0.60; P<0.001) and was well calibrated. The predicted number needed to treat with DES to prevent 1 TVR compared with bare metal stents ranged from 6 (95% confidence interval, 5.4-7.6) to 80 (95% confidence interval, 62.7-116.3), depending on patients' clinical and angiographic factors.
CONCLUSIONS: A predictive model using commonly collected variables can identify patients who may derive the greatest benefit in TVR reduction from DES. Whether use of the model improves the safety and cost-effectiveness of DES use should be tested prospectively.
Cardiac surgical report cards have historically been mandatory. This paradigm changed when The Society of Thoracic Surgeons recently implemented a voluntary public reporting program based on benchmark analyses from its National Cardiac Database. The primary rationale is to provide transparency and accountability, thus affirming the fundamental ethical right of patient autonomy. Previous studies suggest that public reporting facilitates quality improvement, although other approaches such as confidential feedback of results and regional quality improvement initiatives are also effective. Public reporting has not substantially impacted patient referral patterns or market share. However, this may change with implementation of healthcare reform and with refinement of public reporting formats to enhance consumer interpretability. Finally, the potential unintended adverse consequences of public reporting must be monitored, particularly to assure that hospitals and surgeons remain willing to care for high-risk patients.
Appropriate implementation is essential to create a credible public reporting system. Ideally, data should be obtained from an audited clinical data registry, and structure, process, or outcomes metrics may be reported. Composite measures are increasingly used, as are measures of appropriateness, patient satisfaction, functional status, and health-related quality of life. Classification of provider performance should use statistical criteria appropriate to the policy objectives and to the desired balance of sensitivity and specificity. Public reports should use simplified visual or tabular presentation aids that maximize correct interpretation of numerical data. Because of sample size issues, and to emphasize that cardiac surgery requires team-based care, public reporting should generally be focused at the program rather than individual surgeon level. This may also help to mitigate risk aversion, the avoidance of high-risk patients.
Marcella Nunez-Smith, Elizabeth H Bradley, Jeph Herrin, Calie Santana, Leslie A Curry, Sharon-Lise T Normand, and Harlan M Krumholz. 2011. “Quality of care in the US territories.” Arch Intern Med, 171, 17, Pp. 1528-40.Abstract
BACKGROUND: Health care quality in the US territories is poorly characterized. We used process measures to compare the performance of hospitals in the US territories and in the US states.
METHODS: Our sample included nonfederal hospitals located in the United States and its territories discharging Medicare fee-for-service (FFS) patients with a principal discharge diagnosis of acute myocardial infarction (AMI), heart failure (HF), or pneumonia (PNE) (July 2005-June 2008). We compared risk-standardized 30-day mortality and readmission rates between territorial and stateside hospitals, adjusting for performance on core process measures and hospital characteristics.
RESULTS: In 57 territorial hospitals and 4799 stateside hospitals, hospital mean 30-day risk-standardized mortality rates were significantly higher in the US territories (P<.001) for AMI (18.8% vs 16.0%), HF (12.3% vs 10.8%), and PNE (14.9% vs 11.4%). Hospital mean 30-day risk-standardized readmission rates (RSRRs) were also significantly higher in the US territories for AMI (20.6% vs 19.8%; P=.04), and PNE (19.4% vs 18.4%; P=.01) but was not significant for HF (25.5% vs 24.5%; P=.07). The higher risk-standardized mortality rates in the US territories remained statistically significant after adjusting for hospital characteristics and core process measure performance. Hospitals in the US territories had lower performance on all core process measures (P<.05).
CONCLUSIONS: Compared with hospitals in the US states, hospitals in the US territories have significantly higher 30-day mortality rates and lower performance on every core process measure for patients discharged after AMI, HF, and PNE. Eliminating the substantial quality gap in the US territories should be a national priority.