Publications

2019
Lin C, Miller T, Dligach D, Bethard S, Savova G. A BERT-based Universal Model for Both Within- and Cross-sentence Clinical Temporal Relation Extraction, in Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, Minnesota, USA: Association for Computational Linguistics ; 2019 :65–71. Publisher's VersionAbstract
Classic methods for clinical temporal relation extraction focus on relational candidates within a sentence. On the other hand, break-through Bidirectional Encoder Representations from Transformers (BERT) are trained on large quantities of arbitrary spans of contiguous text instead of sentences. In this study, we aim to build a sentence-agnostic framework for the task of CONTAINS temporal relation extraction. We establish a new state-of-the-art result for the task, 0.684F for in-domain (0.055-point improvement) and 0.565F for cross-domain (0.018-point improvement), by fine-tuning BERT and pre-training domain-specific BERT models on sentence-agnostic temporal relation instances with WordPiece-compatible encodings, and augmenting the labeled data with automatically generated “silver” instances.
Wright-Bettner K, Palmer M, Savova G, de Groen P, Miller T. Cross-document coreference: An approach to capturing coreference without context, in Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019). Hong Kong: Association for Computational Linguistics ; 2019 :1–10. Publisher's VersionAbstract
This paper discusses a cross-document coreference annotation schema that was developed to further automatic extraction of timelines in the clinical domain. Lexical senses and coreference choices are determined largely by context, but cross-document work requires reasoning across contexts that are not necessarily coherent. We found that an annotation approach that relies less on context-guided annotator intuitions and more on schematic rules was most effective in creating meaningful and consistent cross-document relations.
Miller T, Geva A, Dligach D. Extracting Adverse Drug Event Information with Minimal Engineering, in Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, Minnesota, USA: Association for Computational Linguistics ; 2019 :22–27. Publisher's VersionAbstract
In this paper we describe an evaluation of the potential of classical information extraction methods to extract drug-related attributes, including adverse drug events, and compare to more recently developed neural methods. We use the 2018 N2C2 shared task data as our gold standard data set for training. We train support vector machine classifiers to detect drug and drug attribute spans, and pair these detected entities as training instances for an SVM relation classifier, with both systems using standard features. We compare to baseline neural methods that use standard contextualized embedding representations for entity and relation extraction. The SVM-based system and a neural system obtain comparable results, with the SVM system doing better on concepts and the neural system performing better on relation extraction tasks. The neural system obtains surprisingly strong results compared to the system based on years of research in developing features for information extraction.
Miller T. Simplified Neural Unsupervised Domain Adaptation, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics ; 2019 :414–419. Publisher's VersionAbstract
Unsupervised domain adaptation (UDA) is the task of training a statistical model on labeled data from a source domain to achieve better performance on data from a target domain, with access to only unlabeled data in the target domain. Existing state-of-the-art UDA approaches use neural networks to learn representations that are trained to predict the values of subset of important features called “pivot features” on combined data from the source and target domains. In this work, we show that it is possible to improve on existing neural domain adaptation algorithms by 1) jointly training the representation learner with the task learner; and 2) removing the need for heuristically-selected “pivot features.” Our results show competitive performance with a simpler model.
Liu D, Dligach D, Miller T. Two-stage Federated Phenotyping and Patient Representation Learning, in Proceedings of the 18th BioNLP Workshop and Shared Task. Florence, Italy: Association for Computational Linguistics ; 2019 :283–291. Publisher's VersionAbstract
A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not generalizable and error-prone due to the heterogeneity and uniqueness of medical documents. We develop a two-stage federated natural language processing method that enables utilization of clinical notes from different hospitals or clinics without moving the data, and demonstrate its performance using obesity and comorbities phenotyping as medical task. This approach not only improves the quality of a specific clinical task but also facilitates knowledge progression in the whole healthcare system, which is an essential part of learning health system. To the best of our knowledge, this is the first application of federated machine learning in clinical NLP.
Jin L, Doshi-Velez F, Miller T, Schwartz L, Schuler W. Unsupervised Learning of PCFGs with Normalizing Flow, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics ; 2019 :2442–2452. Publisher's VersionAbstract
Unsupervised PCFG inducers hypothesize sets of compact context-free rules as explanations for sentences. PCFG induction not only provides tools for low-resource languages, but also plays an important role in modeling language acquisition (Bannard et al., 2009; Abend et al. 2017). However, current PCFG induction models, using word tokens as input, are unable to incorporate semantics and morphology into induction, and may encounter issues of sparse vocabulary when facing morphologically rich languages. This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information. Linguistically motivated sparsity and categorical distance constraints are imposed on the inducer as regularization. Experiments show that the PCFG induction model with normalizing flow produces grammars with state-of-the-art accuracy on a variety of different languages. Ablation further shows a positive effect of normalizing flow, context embeddings and proposed regularizers.
Savova GK, Danciu I, Alamudun F, Miller T, Lin C, Bitterman DS, Tourassi G, Warner JL. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer research. 2019;79 (21) :5463–5470.Abstract
Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data.
2018
Jin L, Doshi-Velez F, Miller T, Schuler W, Schwartz L. Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics ; 2018 :2721–2731. Publisher's Version
Dligach D, Miller T. Learning Patient Representations from Text, in Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics ; 2018 :119–123. Publisher's Version
Lin C, Miller T, Dligach D, Amiri H, Bethard S, Savova G. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction, in Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis. Association for Computational Linguistics ; 2018 :165–176. Publisher's Version
Amiri H, Miller T, Savova G. Spotting Spurious Data with Neural Networks, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics ; 2018 :2006–2016. Publisher's Version
Jin L, Doshi-Velez F, Miller T, Schuler W, Schwartz L. Unsupervised Grammar Induction with Depth-bounded PCFG. Transactions of the Association for Computational Linguistics. 2018;6 :211–224. Publisher's Version
2017
Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, et al. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res. 2017;77 (21) :e115-e118.Abstract
Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. .
Dligach D, Miller T, Lin C, Bethard S, Savova G. Neural Temporal Relation Extraction, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics ; 2017 :746–751. Publisher's VersionAbstract
We experiment with neural architectures for temporal relation extraction and establish a new state-of-the-art for several scenarios. We find that neural models with only tokens as input outperform state-of-the-art hand-engineered feature-based models, that convolutional neural networks outperform LSTM models, and that encoding relation arguments with XML tags outperforms a traditional position-based encoding.
Miller T, Dligach D, Bethard S, Lin C, Savova G. Towards generalizable entity-centric clinical coreference resolution. J Biomed Inform. 2017;69 :251-258.Abstract
OBJECTIVE: This work investigates the problem of clinical coreference resolution in a model that explicitly tracks entities, and aims to measure the performance of that model in both traditional in-domain train/test splits and cross-domain experiments that measure the generalizability of learned models. METHODS: The two methods we compare are a baseline mention-pair coreference system that operates over pairs of mentions with best-first conflict resolution and a mention-synchronous system that incrementally builds coreference chains. We develop new features that incorporate distributional semantics, discourse features, and entity attributes. We use two new coreference datasets with similar annotation guidelines - the THYME colon cancer dataset and the DeepPhe breast cancer dataset. RESULTS: The mention-synchronous system performs similarly on in-domain data but performs much better on new data. Part of speech tag features prove superior in feature generalizability experiments over other word representations. Our methods show generalization improvement but there is still a performance gap when testing in new domains. DISCUSSION: Generalizability of clinical NLP systems is important and under-studied, so future work should attempt to perform cross-domain and cross-institution evaluations and explicitly develop features and training regimens that favor generalizability. A performance-optimized version of the mention-synchronous system will be included in the open source Apache cTAKES software.
Viani N, Miller TA, Dligach D, Bethard S, Napolitano C, Priori SG, Bellazzi R, Sacchi L, Savova GK. Recurrent Neural Network Architectures for Event Extraction from Italian Medical Reports. In: ten Teije A, Popow C, Holmes JH, Sacchi L Artificial Intelligence in Medicine: 16th Conference on Artificial Intelligence in Medicine, AIME 2017, Vienna, Austria, June 21-24, 2017, Proceedings. Cham: Springer International Publishing ; 2017. pp. 198–202. Publisher's VersionAbstract
Medical reports include many occurrences of relevant events in the form of free-text. To make data easily accessible and improve medical decisions, clinical information extraction is crucial. Traditional extraction methods usually rely on the availability of external resources, or require complex annotated corpora and elaborate designed features. Especially for languages other than English, progress has been limited by scarce availability of tools and resources. In this work, we explore recurrent neural network (RNN) architectures for clinical event extraction from Italian medical reports. The proposed model includes an embedding layer and an RNN layer. To find the best configuration for event extraction, we explored different RNN architectures, including Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). We also tried feeding morpho-syntactic information into the network. The best result was obtained by using the GRU network with additional morpho-syntactic inputs.
Lin C, Miller T, Dligach D, Bethard S, Savova G. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks, in BioNLP 2017. Vancouver, Canada, Association for Computational Linguistics ; 2017 :322–327. Publisher's VersionAbstract
Token sequences are often used as the input for Convolutional Neural Networks (CNNs) in natural language processing. However, they might not be an ideal representation for time expressions, which are long, highly varied, and semantically complex. We describe a method for representing time expressions with single pseudo-tokens for CNNs. With this method, we establish a new state-of-the-art result for a clinical temporal relation extraction task.
Miller T, Bethard S, Amiri H, Savova G. Unsupervised Domain Adaptation for Clinical Negation Detection, in BioNLP 2017. Vancouver, Canada, Association for Computational Linguistics ; 2017 :165–170. Publisher's VersionAbstract
Detecting negated concepts in clinical texts is an important part of NLP information extraction systems. However, generalizability of negation systems is lacking, as cross-domain experiments suffer dramatic performance losses. We examine the performance of multiple unsupervised domain adaptation algorithms on clinical negation detection, finding only modest gains that fall well short of in-domain performance.
Amiri H, Miller T, Savova G. Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics ; 2017 :2401–2410. Publisher's VersionAbstract
We present a novel approach for training artificial neural networks. Our approach is inspired by broad evidence in psychology that shows human learners can learn efficiently and effectively by increasing intervals of time between subsequent reviews of previously learned materials (spaced repetition). We investigate the analogy between training neural models and findings in psychology about human memory model and develop an efficient and effective algorithm to train neural models. The core part of our algorithm is a cognitively-motivated scheduler according to which training instances and their "reviews" are spaced over time. Our algorithm uses only 34-50% of data per epoch, is 2.9-4.8 times faster than standard training, and outperforms competing state-of-the-art baselines. Our code is available at scholar.harvard.edu/hadi/RbF/.
2016
Lin C, Dligach D, Miller TA, Bethard S, Savova GK. Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc. 2016;23 (2) :387-95.Abstract
OBJECTIVE: To develop an open-source temporal relation discovery system for the clinical domain. The system is capable of automatically inferring temporal relations between events and time expressions using a multilayered modeling strategy. It can operate at different levels of granularity--from rough temporality expressed as event relations to the document creation time (DCT) to temporal containment to fine-grained classic Allen-style relations. MATERIALS AND METHODS: We evaluated our systems on 2 clinical corpora. One is a subset of the Temporal Histories of Your Medical Events (THYME) corpus, which was used in SemEval 2015 Task 6: Clinical TempEval. The other is the 2012 Informatics for Integrating Biology and the Bedside (i2b2) challenge corpus. We designed multiple supervised machine learning models to compute the DCT relation and within-sentence temporal relations. For the i2b2 data, we also developed models and rule-based methods to recognize cross-sentence temporal relations. We used the official evaluation scripts of both challenges to make our results comparable with results of other participating systems. In addition, we conducted a feature ablation study to find out the contribution of various features to the system's performance. RESULTS: Our system achieved state-of-the-art performance on the Clinical TempEval corpus and was on par with the best systems on the i2b2 2012 corpus. Particularly, on the Clinical TempEval corpus, our system established a new F1 score benchmark, statistically significant as compared to the baseline and the best participating system. CONCLUSION: Presented here is the first open-source clinical temporal relation discovery system. It was built using a multilayered temporal modeling strategy and achieved top performance in 2 major shared tasks.

Pages