Methods for using clinical laboratory test results as baseline confounders in multi-site observational database studies when missing data are expected

Citation:

Raebel MA, Shetterly S, Lu CY, Flory J, Gagne JJ, Harrell FE, Haynes K, Herrinton LJ, Patorno E, Popovic J, et al. Methods for using clinical laboratory test results as baseline confounders in multi-site observational database studies when missing data are expected. Pharmacoepidemiol Drug Saf. 2016;25 (7) :798-814.

Date Published:

2016 07

Abstract:

PURPOSE: Our purpose was to quantify missing baseline laboratory results, assess predictors of missingness, and examine performance of missing data methods. METHODS: Using the Mini-Sentinel Distributed Database from three sites, we selected three exposure-outcome scenarios with laboratory results as baseline confounders. We compared hazard ratios (HRs) or risk differences (RDs) and 95% confidence intervals (CIs) from models that omitted laboratory results, included only available results (complete cases), and included results after applying missing data methods (multiple imputation [MI] regression, MI predictive mean matching [PMM] indicator). RESULTS: Scenario 1 considered glucose among second-generation antipsychotic users and diabetes. Across sites, glucose was available for 27.7-58.9%. Results differed between complete case and missing data models (e.g., olanzapine: HR 0.92 [CI 0.73, 1.12] vs 1.02 [0.90, 1.16]). Across-site models employing different MI approaches provided similar HR and CI; site-specific models provided differing estimates. Scenario 2 evaluated creatinine among individuals starting high versus low dose lisinopril and hyperkalemia. Creatinine availability: 44.5-79.0%. Results differed between complete case and missing data models (e.g., HR 0.84 [CI 0.77, 0.92] vs. 0.88 [0.83, 0.94]). HR and CI were identical across MI methods. Scenario 3 examined international normalized ratio (INR) among warfarin users starting interacting versus noninteracting antimicrobials and bleeding. INR availability: 20.0-92.9%. Results differed between ignoring INR versus including INR using missing data methods (e.g., RD 0.05 [CI -0.03, 0.13] vs 0.09 [0.00, 0.18]). Indicator and PMM methods gave similar estimates. CONCLUSION: Multi-site studies must consider site variability in missing data. Different missing data methods performed similarly. Copyright © 2016 John Wiley & Sons, Ltd.
Last updated on 05/31/2019