Transparent Reporting on Research Using Unstructured Electronic Health Record Data to Generate 'Real World' Evidence of Comparative Effectiveness and Safety

Date Published:

2019 Aug 26


Research that makes secondary use of administrative and clinical healthcare databases is increasingly influential for regulatory, reimbursement, and other healthcare decision-making. Consequently, there are numerous guidance documents on reporting for studies that use 'real-world' data captured in administrative claims and electronic health record (EHR) databases. These guidance documents are intended to improve transparency, reproducibility, and the ability to evaluate validity and relevance of design and analysis decisions. However, existing guidance does not differentiate between structured and unstructured information contained in EHRs, registries, or other healthcare data sources. While unstructured text is convenient and readily interpretable in clinical practice, it can be difficult to use for investigation of causal questions, e.g., comparative effectiveness and safety, until data have been cleaned and algorithms applied to extract relevant information to structured fields for analysis. The goal of this paper is to increase transparency for healthcare decision makers and causal inference researchers by providing general recommendations for reporting on steps taken to make unstructured text-based data usable for comparative effectiveness and safety research. These recommendations are designed to be used as an adjunct for existing reporting guidance. They are intended to provide sufficient context and supporting information for causal inference studies involving use of natural language processing- or machine learning-derived data fields, so that researchers, reviewers, and decision makers can be confident in their ability to evaluate the validity and relevance of derived measures for exposures, inclusion/exclusion criteria, covariates, and outcomes for the causal question of interest.