Publicly reported quality and safety rating systems represent promising innovations for rating hospital performance. Still, rating the raters via the data is needed for meaningful comparisons and to establish integrated oversight. Data is now a kind of capital, on par with social and human capital, affecting hospital services. Using the lens of data capital, these ranking systems’ content (e.g., data, metrics) can be combined into a composite to offer another perspective.
Learning representations of clinical notes poses challenges in handling complex content that necessitates preprocessing steps to make the data more suitable for data mining. An important issue, addressed here, is that of temporal expressions, where cues indicate the time when clinical events occur. We present a three-step data reconstruction algorithm for transforming similar clinical entities (e.g., symptoms, complications) into sequential data through unsupervised annotation of temporal expressions. First, the data reconstruction algorithm detects if an expression has temporal intent. Second, it decomposes and rewrites the expression into non-temporal sub-expression and temporal constraints. Finally, it clusters similar non-temporal sub-expressions by using unsupervised sentence embedding under the modified K-medoids paradigm. We experimented with our proposed algorithm on clinical notes associated with chronic obstructive pulmonary disease (COPD). Visualizing reconstruction results of cardiology reports for a longitudinal cohort of patients with COPD demonstrated that this algorithm is feasible.
Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of mortality in the United States. Representing COPD progression using temporal graphs may offer critical clinical insights. Long-Short Term Memory units in recurrent neural networks can process data with constant elapsed times between consecutive elements of a sequence but cannot handle irregular time intervals (i.e., segments with unequal-time). In this study, we propose a four-layer deep learning model that utilizes a specially configured recurrent neural network to capture irregular time lapse segments. Experiments on a corpus of COPD patients’ clinical notes compared to baseline algorithms showed that our model improved interpretability as well as the accuracy of estimating COPD progression.
Illustration of all three types of clinical notes in COPD patient (Fig. 4@Tableau).
Elucidating biological mechanisms underlying complex diseases is an important goal in biomedical research. Recent advances in biological technology have enabled the generation of massive volume of data in genomics, transcriptomics, proteomics, epigenomics, metagenomics, metabolomics, nutriomics, etc., leading to the emergence of systems biology approach to investigating complex diseases. However, most of the data remain underutilized after their initial acquisition and analysis. There is a growing gap between the generation of the multifaceted data and our ability to integrate and analyze them. Inspired by the observation that many of the aforementioned data can be represented by networks, we propose a networkbased model to encapsulate the rich information provided in each database and to connect across different databases. We integrate several public databases to construct a heterogeneous network in which nodes are entities such as genes, miRNAs, diseases, and edges represent known relationships between them. One fundamental challenge is how to perform meaningful analysis on such network, overcoming the intrinsic heterogeneity. We propose a network embedding method to learn a low-dimensional vector space that best preserves the known relationships between entities. Based on the learned vector representations, entities that are close to each other but currently do not have known direct connections, are likely to have an association and therefore are good candidates for future investigation. In the experiments, we construct a heterogeneous network of genes, miRNAs and diseases using data from six public databases. To evaluate the performance of the proposed method, we predict disease-gene and disease-miRNA associations. Comparison of our novel method with several state-of-the-art methods clearly demonstrates the advantage of our method, as it is the only one that takes full advantage of the rich contextual information provided by the heterogeneous network. The encouraging results suggest that our method can provide help in identifying new hypotheses to guide future research.
Research clues can be expressed as coherent chains of keywords grouped by theme. Capturing clues to research from the vast and expanding medical literature is valuable. Yet, it is difficult to automatically create clear visualizations of research clues despite the presence of many competing summarization tools. In this paper, we propose a linear classifier based on a spiral, which we call a regional classifier. The study emphasizes the development of visualization methods and the process of finding a specific research clue to track patient needs reported in medical literature. When timelines are combined with a spiral geographical map, they show a geometric shape that helps to reveal the clues from different spatial viewpoints and periodical constraints. Our evaluation showed that the regional classifier produces better visual effects than support vector machine classifiers. It covers important concepts of each theme and is able to represent the relationships among papers in a way that captures continuous developments and changes in key themes.