I am a research scientist working in the surgery department of Massachusetts general hospital(MGH) and Harvard medical school. My recent research focuses on building better informatic systems to leveraging artificial intelligence to provide sophisticated insights based on large chunks of electronic health records and data gathered by the sensors in smartphones, wearables and other portable devices.

I have been interested in high-throughput omics data analysis, database development, and online data analysis platform design at big data scope. I participated in the next-generation SEquencing Quality Control (SEQC) project lead by the US Food and Drug Administration and integrated mass spectrometric data with RNA-Seq data for studying the landscape of the human proteome. I designed the system architecture for the first one-stop proteomic cloud platform(Firmiana), which allows scientists to deposit mass spectrometry (MS) raw files, perform proteome identification and quantification online, carry out bioinformatics analyses, and extract knowledge without the need for programming expertise. I developed the Arabidopsis thaliana Protein Interactome Database (AtPID) and had been maintaining and keep updating this well-used online resource in the past ten years.

After joining MGH, I have been heavily involved with the development and application of statistical and computational methods to address problems in biomedical and genomic research. I performed the data mining of the Inflammation and Host Response to Injury Large-scale Collaborative Research Program (Glue Grant) and developed an online database, KERIS: kaleidoscope of gene responses to inflammation between species (http://www.igenomed.org/keris/), to help biologists choosing models when studying the mechanisms of particular genes and pathways in disease and prioritizing the translation of findings from disease models into clinical studies.

I have designed DMedic: a Real-Time Indexing, Visualization, and Analysis Platform to improve the workflow of next generation personalized medical care based on deep AI analytics of EHR. The system includes a distributed, JSON-based search and analytics engine that provides real-time data extraction among patient records including full-text search of free-text clinical reports; a real-time visualization system that allows interactive exploration of the EHR data for data mining; and an automatic model training system which embeds multiple open-source machine learning platforms as well as GPU technologies to help deriving insights from massive unstructured data orders of magnitude faster. I currently in the middle of multiple projects that utilize the DMedic on the EHRs of different diseases for disease prevention, early diagnosis, and treatment optimization.