Objectives To optimize an unsupervised generative model to infer associated complications at any disease progression stages for patients with type 2 diabetes.
Materials and Methods Our study utilized the data from a 17-hospital-based regional healthcare delivery network in Shanghai, China, representing the real-world common incomplete electronic health records (EHRs). We used an optimized generative model to realize Markov-based virtual patient simulations.
Results Our model was trained and tested on a longitudinal cohort of patients (9,298 in an 11-year timespan), who have developed, or are at risk developing, type 2 diabetes, to simulate a specified number of virtual patients with the entire progression path (5,000 with a 23.9-year illness trajectory). With illness trajectory evaluated by endocrinologists, the findings indicate both retrospective and prospective possibilities to help understand diabetes and associated complications; particularly, given a target stage, it is straightforward to infer the risks of any complications at other stages, not merely transitioning from an earlier state to a later state but from a later state to an earlier state.
Discussion The optimized generative model aims to deal with the lack of a comprehensive tracking of the disease’s natural history. Virtual patient trajectories simulated by the generative model can offer a strong level of privacy through a lower risk of identifying real patients.
Conclusions Using a generative model can help solve incomplete and/or insufficient data problems. It is feasible to facilitate population health management as a statistical retrospect or prediction of virtual patient trajectories.