Groschel M, Owens M, Freschi L, Vargas R, Marin M, Phelan J, Iqbal Z, Dixit A, and Farhat MR. 8/30/2021. “Gen TB: A user-friendly genome-based predictor of tuberculosis resistance powered by machine learning.” Genome Medicine. Publisher's VersionAbstract



Multidrug-resistant Mycobacterium tuberculosis (Mtb) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practitioners to rapidly diagnose resistance and inform treatment regimens.


We present Translational Genomics platform for Tuberculosis (GenTB), a free and open web-based application to predict antibiotic resistance from next-generation sequence data. The user can choose between two potential predictors, a Random Forest (RF) classifier and a Wide and Deep Neural Network (WDNN) to predict phenotypic resistance to 13 and 10 anti-tuberculosis drugs, respectively. We benchmark GenTB’s predictive performance along with leading TB resistance prediction tools (Mykrobe and TB-Profiler) using a ground truth dataset of 20,408 isolates with laboratory-based drug susceptibility data. All four tools reliably predicted resistance to first-line tuberculosis drugs but had varying performance for second-line drugs. The mean sensitivities for GenTB-RF and GenTB-WDNN across the nine shared drugs were 77.6% (95% CI 76.6–78.5%) and 75.4% (95% CI 74.5–76.4%), respectively, and marginally higher than the sensitivities of TB-Profiler at 74.4% (95% CI 73.4–75.3%) and Mykrobe at 71.9% (95% CI 70.9–72.9%). The higher sensitivities were at an expense of ≤ 1.5% lower specificity: Mykrobe 97.6% (95% CI 97.5–97.7%), TB-Profiler 96.9% (95% CI 96.7 to 97.0%), GenTB-WDNN 96.2% (95% CI 96.0 to 96.4%), and GenTB-RF 96.1% (95% CI 96.0 to 96.3%). Averaged across the four tools, genotypic resistance sensitivity was 11% and 9% lower for isoniazid and rifampicin respectively, on isolates sequenced at low depth (< 10× across 95% of the genome) emphasizing the need to quality control input sequence data before prediction. We discuss differences between tools in reporting results to the user including variants underlying the resistance calls and any novel or indeterminate variants


GenTB is an easy-to-use online tool to rapidly and accurately predict resistance to anti-tuberculosis drugs. GenTB can be accessed online at, and the source code is available at



Vargas R, Freschi L, Spitaleri A, Tahseen S, Barilar I, Neimann S, Miotto P, Cirillo D, Koser C, and Farhat MR. 8/30/2021. “The role of epistasis in amikacin, kanamycin, bedaquiline, and clofazimine resistance in Mycobacterium tuberculosis complex.” Antimicrobial Agents and Chemotherapy. Publisher's VersionAbstract


Antibiotic resistance among bacterial pathogens poses a major global health threat. M. tuberculosis complex (MTBC) is estimated to have the highest resistance rates of any pathogen globally. Given the slow growth rate and the need for a biosafety level 3 laboratory, the only realistic avenue to scale up drug susceptibility testing (DST) for this pathogen is to rely on genotypic techniques. This raises the fundamental question of whether a mutation is a reliable surrogate for phenotypic resistance or whether the presence of a second mutation can completely counteract its effect, resulting in major diagnostic errors (i.e. systematic false resistance results). To date, such epistatic interactions have only been reported for streptomycin that is now rarely used. By analyzing more than 31,000 MTBC genomes, we demonstrated that the eis C-14T promoter mutation, which is interrogated by several genotypic DST assays endorsed by the World Health Organization, cannot confer resistance to amikacin and kanamycin if it coincides with loss-of-function (LoF) mutations in the coding region of eis. To our knowledge, this represents the first definitive example of antibiotic reversion in MTBC. Moreover, we raise the possibility that mmpR (Rv0678) mutations are not valid markers of resistance to bedaquiline and clofazimine if these coincide with a LoF mutation in the efflux pump encoded by mmpS5 (Rv0677c) and mmpL5 (Rv0676c).

Kaafarani H, Gaitanidis A, Farhat MR., Christensen M, Breen K, Mendoza A, Fagenholz P, and Velmahos G. 7/28/2021. “Association Between NEDD4L Variation and the Genetic Risk of Acute Appendicitis, A Multi-institutional Genome-Wide Association Study.” JAMA Surgery. Publisher's VersionAbstract

Importance  The familial aspect of acute appendicitis (AA) has been proposed, but its hereditary basis remains undetermined.

Objective  To identify genomic variants associated with AA.

Design, Setting, and Participants  This genome-wide association study, conducted from June 21, 2019, to February 4, 2020, used a multi-institutional biobank to retrospectively identify patients with AA across 8 single-nucleotide variation (SNV) genotyping batches. The study also examined differential gene expression in appendiceal tissue samples between patients with AA and controls using the GSE9579 data set in the National Institutes of Health’s Gene Expression Omnibus repository. Statistical analysis was conducted from October 1, 2019, to February 4, 2020.

Main Outcomes and Measures  Single-nucleotide variations with a minor allele frequency of 5% or higher were tested for association with AA using a linear mixed model. The significance threshold was set at P = 5 × 10−8.

Results  A total of 29 706 patients (15 088 women [50.8%]; mean [SD] age at enrollment, 60.1 [17.0] years) were included, 1743 of whom had a history of AA. The genomic inflation factor for the cohort was 1.003. A previously unknown SNV at chromosome 18q was found to be associated with AA (rs9953918: odds ratio, 0.99; 95% CI, 0.98-1.00; P = 4.48 × 10−8). This SNV is located in an intron of the NEDD4L gene. The heritability of appendicitis was estimated at 30.1%. Gene expression data from appendiceal tissue donors identified NEDD4L to be among the most differentially expressed genes (14 of 22 216 genes; β [SE] = −2.71 [0.44]; log fold change = −1.69; adjusted P = .04).

Conclusions and Relevance  This study identified SNVs within the NEDD4L gene as being associated with AA. Nedd4l is involved in the ubiquitination of intestinal ion channels and decreased Nedd4l activity may be implicated in the pathogenesis of AA. These findings can improve the understanding of the genetic predisposition to and pathogenesis of AA.

M Marin, R Vargas, M Harris, B Jeffrey, L.E Epperson, D Durbin, M Strong, M Salfinger, Z Iqbal, I Akhundova, S Vashakidze, V Crudu, A Rosenthal, and MR Farhat. 4/8/2021. “Genomic sequence characteristics and the empiric accuracy of short-read sequencing.” bioRxiv. Publisher's VersionAbstract
Background: Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias, reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. For the clonal pathogen Mycobacterium tuberculosis (Mtb), researchers frequently exclude 10.7% of the genome believed to be repetitive and prone to erroneous variant calls. To benchmark short-read variant calling, we used 36 diverse clinical Mtb isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically study the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content. Results: Reference based Illumina variant calling had a recall ≥89.0% and precision ≥98.5% across parameters evaluated. The best balance between precision and recall was achieved by tuning the mapping quality (MQ) threshold, i.e. confidence of the read mapping (recall 85.8%, precision 99.1% at MQ ≥ 40). Masking repetitive sequence content is an alternative conservative approach to variant calling that maintains high precision (recall 70.2%, precision 99.6% at MQ≥40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52 of the 168 PE/PPE genes (34.5%). We present a refined list of low confidence regions and examine the largest sources of variant calling error. Conclusions: Our improved approach to variant calling has broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms.
Groschel M and Farhat MR. 3/16/2021. “A Legacy of disease.” The Lancet, 21, 3. Publisher's Version
Vargas R., Freschi L, Marin M, Epperson E, Smith M, Oussenko I, Durbin D, Strong M, Salfinger M, and Farhat MR. 2/1/2021. “In-host population dynamics of Mycobacterium tuberculosis complex during active disease.” eLife. Publisher's VersionAbstract

Tuberculosis (TB) is a leading cause of death globally. Understanding the population dynamics of TB’s causative agent Mycobacterium tuberculosis complex (Mtbc) in-host is vital for understanding the efficacy of antibiotic treatment. We use longitudinally collected clinical Mtbc isolates that underwent Whole-Genome Sequencing from the sputa of 200 patients to investigate Mtbc diversity during the course of active TB disease after excluding 107 cases suspected of reinfection, mixed infection or contamination. Of the 178/200 patients with persistent clonal infection >2 months, 27 developed new resistance mutations between sampling with 20/27 occurring in patients with pre-existing resistance. Low abundance resistance variants at a purity of ≥19% in the first isolate predict fixation in the subsequent sample. We identify significant in-host variation in 27 genes, including antibiotic resistance genes, metabolic genes and genes known to modulate host innate immunity and confirm several to be under positive selection by assessing phylogenetic convergence across a genetically diverse sample of 20,352 isolates.

Ektefaie Y, Dixit A, Freschi L, and Farhat MR. 1/27/2021. “Globally diverse Mycobacterium tuberculosis resistance acquisition: a retrospective geographical and temporal analysis of whole genome sequences.” The Lancet Microbe, 2, 3, Pp. e96 - e104. Publisher's Version
Naomi Rankin, Matthias Groshel, and Maha Farhat. 10/22/2020. “When our internship went virtual we needed a new approach.” Science Magazine.
Freschi L, Vargas Jr R, Hussain A, Kamal SMM, Skrahina A, Tahseen S, Ismail N, Barbova A, Niemann S, Cirillo DM, Dean AS, Zignol M, and Farhat MR. 9/29/2020. “Population structure, biogeography and transmissibility of Mycobacterium tuberculosis”. Publisher's VersionAbstract
Mycobacterium tuberculosis is a clonal pathogen proposed to have co-evolved with its human host for millennia, yet our understanding of its genomic diversity and biogeography remains incomplete. Here we use a combination of phylogenetics and dimensionality reduction to reevaluate the population structure of M. tuberculosis, providing the first in-depth analysis of the ancient East African Indian Lineage 1 and the modern Central Asian Lineage 3 and expanding our understanding of Lineages 2 and 4. We assess sub-lineages using genomic sequences from 4,939 pan-susceptible strains and find 30 new genetically distinct clades that we validate in a dataset of 4,645 independent isolates. We characterize sub-lineage geographic distributions and demonstrate a consistent geographically restricted and unrestricted pattern for 20 groups, including three groups of Lineage 1. We assess the transmissibility of the four major lineages by examining the distribution of terminal branch lengths across the M. tuberculosis phylogeny and identify evidence supporting higher transmissibility in Lineages 2 and 4 than 3 and 1 on a global scale. We define a robust expanded barcode of 95 single nucleotide substitutions (SNS) that allows for the rapid identification of 69 Mtb sub-lineages and 26 additional internal groups. Our results paint a higher resolution picture of the Mtb phylogeny and biogeography.
El Moheb M, Naar L, Christensen MA, Kapoen C, Mauer LR, Farhat MR, and Kaafarani HMA. 9/24/2020. “Gastrointestinal Complications in Critically Ill Patients With and Without COVID-19.” JAMA.Abstract
Coronavirus disease 2019 (COVID-19) appears to have significant extrapulmonary complications affecting multiple organ systems.1-3 Critically ill patients with COVID-19 often develop gastrointestinal complications during their hospital stay, including bowel ischemia, transaminitis, gastrointestinal bleeding, pancreatitis, Ogilvie syndrome, and severe ileus.3 Whether the high incidence of gastrointestinal complications is a manifestation of critical illness in general or is specific to COVID-19 remains unclear. We compared the incidence of gastrointestinal complications of critically ill patients with COVID-19–induced acute respiratory distress syndrome (ARDS) vs comparably ill patients with non–COVID-19 ARDS using propensity score analysis.
El Halabi J, Palmer NP, Fox L, Goleb JE, Kohane I, and Farhat MR. 9/2020. “Measuring healthcare delays among privately insured tuberculosis patients in the United States: an observational cohort study. .” The Lancet Infectious Diseases.
Lemieux J, Siddle KJ, Shaw BM, Loreth C, Schaffner S, Gladden-Young A, Adams G, Fink T, Tomkins-Tinch CH, Krasilnikova LA, Deruff KC, Rudy M, Bauer MR, Lagerborg KA, Normandin E, Chapman SB, Reilly SK, Anahtar MN, Lin AE, Carter A, Myhrvold C, Kemball M, Chaluvadi SR, Cusick C, Flowers K, Neumann A, Cerrato F, Farhat MR, Slater D, Harris JB, Branda J, Hooper D, Gaeta JM, Bagett TP, O'Connel J, Gnirke A, Lieberman TB, Philippakis A, Burns M, Brown C, Luban J, Ryan ET, Turbett SE, LaRocque RC, Hanage WP, Gallagher G, Madoff LC, Smole S, Pierce VM, Rosenburg ES, Sabeti S, Park DJ, and MacInnis BL. 8/25/2020. “Phylogenetic analysis of SARS-CoV-2 in the Boston area highlights the role of recurrent importation and superspreading events ”. Publisher's VersionAbstract

SARS-CoV-2 has caused a severe, ongoing outbreak of COVID-19 in Massachusetts with 111,070 confirmed cases and 8,433 deaths as of August 1, 2020. To investigate the introduction, spread, and epidemiology of COVID-19 in the Boston area, we sequenced and analyzed 772 complete SARS-CoV-2 genomes from the region, including nearly all confirmed cases within the first week of the epidemic and hundreds of cases from major outbreaks at a conference, a nursing facility, and among homeless shelter guests and staff. The data reveal over 80 introductions into the Boston area, predominantly from elsewhere in the United States and Europe. We studied two superspreading events covered by the data, events that led to very different outcomes because of the timing and populations involved. One produced rapid spread in a vulnerable population but little onward transmission, while the other was a major contributor to sustained community transmission, including outbreaks in homeless populations, and was exported to several other domestic and international sites. The same two events differed significantly in the number of new mutations seen, raising the possibility that SARS-CoV-2 superspreading might encompass disparate transmission dynamics. Our results highlight the failure of measures to prevent importation into MA early in the outbreak, underscore the role of superspreading in amplifying an outbreak in a major urban area, and lay a foundation for contact tracing informed by genetic data.

Kadura S, King N, Nakhoul M, Zhu H, Theron G, Köser C, and Farhat MR. 5/3/2020. “Systematic review of mutations associated with resistance to the new and repurposed Mycobacterium tuberculosis drugs bedaquiline, clofazimine, linezolid, delamanid, and pretomanid.” Journal of Antimicrobial Chemotherapy.Abstract


Improved genetic understanding of Mycobacterium tuberculosis (MTB) resistance to novel and repurposed anti-tubercular agents can aid the development of rapid molecular diagnostics.


Adhering to PRISMA guidelines, in March 2018, we performed a systematic review of studies implicating mutations in resistance through sequencing and phenotyping before and/or after spontaneous resistance evolution, as well as allelic exchange experiments. We focused on the novel drugs bedaquiline, delamanid, pretomanid and the repurposed drugs clofazimine and linezolid. A database of 1373 diverse control MTB whole genomes, isolated from patients not exposed to these drugs, was used to further assess genotype–phenotype associations.


Of 2112 papers, 54 met the inclusion criteria. These studies characterized 277 mutations in the genes atpEmmpRpepQRv1979cfgd1fbiABC and ddn and their association with resistance to one or more of the five drugs. The most frequent mutations for bedaquiline, clofazimine, linezolid, delamanid and pretomanid resistance were atpE A63P, mmpR frameshifts at nucleotides 192–198, rplC C154R, ddn W88* and ddn S11*, respectively. Frameshifts in the mmpR homopolymer region nucleotides 192–198 were identified in 52/1373 (4%) of the control isolates without prior exposure to bedaquiline or clofazimine. Of isolates resistant to one or more of the five drugs, 59/519 (11%) lacked a mutation explaining phenotypic resistance.


This systematic review supports the use of molecular methods for linezolid resistance detection. Resistance mechanisms involving non-essential genes show a diversity of mutations that will challenge molecular diagnosis of bedaquiline and nitroimidazole resistance. Combined phenotypic and genotypic surveillance is needed for these drugs in the short term.

Gröschel MI, Meehan CJ, Barilar I, Diricks M, Gonzaga A, Steglich M, Conchillo-Solé O, Scherer IC, Mamat U, Luz CF, Bruyne KD, Utpatel C, Yero D, Gibert I, Daura X, Kampmeier S, Rahman NA, Kresken M, Werf TS, Alio I, Streit WR, Zhou K, Schwartz T, Rossen JWA, Farhat MR, Schaible UE, Nübel U, Rupp J, Steinmann J, Niemann S, and Kohl TA. 4/27/2020. “The Global Phylogenetic Landscape and Nosocomial Spread of the Multidrug-Resistant Opportunist Stenotrophomonas Maltophilia.” Genomics. Publisher's VersionAbstract

Recent studies portend a rising global spread and adaptation of human- or healthcare-associated pathogens. Here, we analysed an international collection of the emerging, multidrug-resistant, opportunistic pathogen Stenotrophomonas maltophilia from 22 countries to infer population structure and clonality at a global level. We show that the S. maltophilia complex is divided into 23 monophyletic lineages, most of which harboured strains of all degrees of human virulence. Lineage Sm6 comprised the highest rate of human-associated strains, linked to key virulence and resistance genes. Transmission analysis identified potential outbreak events of genetically closely related strains isolated within days or weeks in the same hospitals.

One Sentence Summary The S. maltophilia complex comprises genetically diverse, globally distributed lineages with evidence for intra-hospital transmission.

Wilson DJ and The CRyPTIC Consortium. 3/13/2020. “GenomegaMap within-species genome-wide dN/dS estimation from over 10,000 genomes.” Molecular Biology and Evolotion. Publisher's VersionAbstract
The dN/dS ratio provides evidence of adaptation or functional constraint in protein-coding genes by quantifying the relative excess or deficit of amino acid-replacing versus silent nucleotide variation. Inexpensive sequencing promises a better understanding of parameters such as dN/dS, but analysing very large datasets poses a major statistical challenge. Here I introduce genomegaMap for estimating within-species genome-wide variation in dN/dS, and I apply it to 3,979 genes across 10,209 tuberculosis genomes to characterize the selection pressures shaping this global pathogen. GenomegaMap is a phylogeny-free method that addresses two major problems with existing approaches: (i) it is fast no matter how large the sample size and (ii) it is robust to recombination, which causes phylogenetic methods to report artefactual signals of adaptation. GenomegaMap uses population genetics theory to approximate the distribution of allele frequencies under general, parent-dependent mutation models. Coalescent simulations show that substitution parameters are well-estimated even when genomegaMap’s simplifying assumption of independence among sites is violated. I demonstrate the ability of genomegaMap to detect genuine signatures of selection at antimicrobial resistance-conferring substitutions in M. tuberculosis and describe a novel signature of selection in the cold-shock DEAD-box protein A gene deaD/csdA. The genomegaMap approach helps accelerate the exploitation of big data for gaining new insights into evolution within species.
Gaitanidis A, Bonde A, Mendosa A, Sillesen MH, El Hechi M, Velmahos G, Kaafarani H, and Farhat MR. 2/28/2020. “Identification of a new genetic variant associated with cholecystitis: a multicenter genome-wide association study.” Journal of Trauma and Acute Care Surgery. Publisher's VersionAbstract


The genomic landscape of gallbladder disease remains poorly understood. We sought to examine the association between genetic variants and the development of cholecystitis.


The Biobank of a large multi-institutional healthcare system was utilized. All patients with cholecystitis were identified using ICD-10 codes and genotyped across 6 batches. To control for population stratification, data was restricted to that from individuals of European genomic ancestry using a multidimensional scaling (MDS) approach. The association between single nucleotide polymorphisms (SNPs) and cholecystitis was evaluated with a mixed linear model-based analysis, controlling for age, sex and obesity. The threshold for significance was set at 5 × 10-8.


Out of 24,635 patients (mean age 60.1 ± 16.7 years, 13,022 [52.9%] females), 900 had cholecystitis (mean age 65.4 ± 14.3 years, 496 [55.1%] females). After meta-analysis, 3 SNPs on chromosome 5p15 exceeded the threshold for significance (p < 5 × 10-8). The phenotypic variance of cholecystitis explained by genetics and controlling for gender and obesity was estimated to be 17.9%.


Using a multi-institutional genomic Biobank, we report a region on chromosome 5p15 is associated with the development of cholecystitis that can be used to identify patients at risk.

Study type

Prognostic and Epidemiological

Level of Evidence 

Level III, case-control study
Vargas R and Farhat MR. 2/19/2020. “Antibiotic treatment and selection for glpK mutations in patients with active tuberculosis disease .” Proceedings of the Academy of Sciences of the United States of America, 117, 8, Pp. 3910-3912. Publisher's Version
Hunt M, Bradley P, Grandjean Lapierre S, Heys S, THomsit M, Hall M, Malone K, Wintringer P, Walker T, Cirillo D, Comas I, Farhat M, Fowler P, Gardy J, Ismail N, Kohl T, Mathys V, Merker M, Niemann S, Vally Omar S, Sintchenko V, Smith G, Soolingen van D, Supply P, Tahseen S, Wilcox M, Arandjelovic I, Peto T, Crook D, and Iqbal Z. 12/2/2019. “Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe.” Wellcome Open Research. Publisher's VersionAbstract
Two billion people are infected with Mycobacterium tuberculosis, leading to 10 million new cases of active tuberculosis and 1.5 million deaths annually. Universal access to drug susceptibility testing (DST) has become a World Health Organization priority. We previously developed a software tool, Mykrobe predictor, which provided offline species identification and drug resistance predictions for M. tuberculosis from whole genome sequencing (WGS) data. Performance was insufficient to support the use of WGS as an alternative to conventional phenotype-based DST, due to mutation catalogue limitations. 

Here we present a new tool, Mykrobe, which provides the same functionality based on a new software implementation. Improvements include i) an updated mutation catalogue giving greater sensitivity to detect pyrazinamide resistance, ii) support for user-defined resistance catalogues, iii) improved identification of non-tuberculous mycobacterial species, and iv) an updated statistical model for Oxford Nanopore Technologies sequencing data. Mykrobe is released under MIT license at We incorporate mutation catalogues from the CRyPTIC consortium et al. (2018) and from Walker et al. (2015), and make improvements based on performance on an initial set of 3206 and an independent set of 5845 M. tuberculosis Illumina sequences. To give estimates of error rates, we use a prospectively collected dataset of 4362 M. tuberculosis isolates. Using culture based DST as the reference, we estimate Mykrobe to be 100%, 95%, 82%, 99% sensitive and 99%, 100%, 99%, 99% specific for rifampicin, isoniazid, pyrazinamide and ethambutol resistance prediction respectively. We benchmark against four other tools on 10207 (=5845+4362) samples, and also show that Mykrobe gives concordant results with nanopore data. 

We measure the ability of Mykrobe-based DST to guide personalized therapeutic regimen design in the context of complex drug susceptibility profiles, showing 94% concordance of implied regimen with that driven by phenotypic DST, higher than all other benchmarked tools.
Yang Y, Walker TM, Walker S, Wilson DJ, Peto T, Crook DW, Shamout F, CRyPTIC Consortium, Zhu T, and Clifton DA. 9/15/2019. “DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis.” Bioinformatics, 35, 18, Pp. 3240–3249. Publisher's VersionAbstract


Resistance co-occurrence within first-line anti-tuberculosis (TB) drugs is a common phenomenon. Existing methods based on genetic data analysis of Mycobacterium tuberculosis (MTB) have been able to predict resistance of MTB to individual drugs, but have not considered the resistance co-occurrence and cannot capture latent structure of genomic data that corresponds to lineages.


We used a large cohort of TB patients from 16 countries across six continents where whole-genome sequences for each isolate and associated phenotype to anti-TB drugs were obtained using drug susceptibility testing recommended by the World Health Organization. We then proposed an end-to-end multi-task model with deep denoising auto-encoder (DeepAMR) for multiple drug classification and developed DeepAMR_cluster, a clustering variant based on DeepAMR, for learning clusters in latent space of the data. The results showed that DeepAMR outperformed baseline model and four machine learning models with mean AUROC from 94.4% to 98.7% for predicting resistance to four first-line drugs [i.e. isoniazid (INH), ethambutol (EMB), rifampicin (RIF), pyrazinamide (PZA)], multi-drug resistant TB (MDR-TB) and pan-susceptible TB (PANS-TB: MTB that is susceptible to all four first-line anti-TB drugs). In the case of INH, EMB, PZA and MDR-TB, DeepAMR achieved its best mean sensitivity of 94.3%, 91.5%, 87.3% and 96.3%, respectively. While in the case of RIF and PANS-TB, it generated 94.2% and 92.2% sensitivity, which were lower than baseline model by 0.7% and 1.9%, respectively. t-SNE visualization shows that DeepAMR_cluster captures lineage-related clusters in the latent space.

Anderson SB, Shapiro JB, Vandenbroucke-Grauls C, and MGJ de Vos. 8/1/2019. “Microbial Evolutionary Medicine - from theory to clinical practice.” Lancet Infectious Diseases, 19, 8, Pp. PE273-E283. Publisher's VersionAbstract
Bacteria and other microbes play a crucial role in human health and disease. Medicine and clinical microbiology have traditionally attempted to identify the etiological agents that causes disease, and how to eliminate them. Yet this traditional paradigm is becoming inadequate for dealing with a changing disease landscape. Major challenges to human health are noncommunicable chronic diseases, often driven by altered immunity and inflammation, and persistent communicable infections whose agents harbor antibiotic resistance. It is increasingly recognized that microbe-microbe interactions, as well as human-microbe interactions are important. Here, we review the "Evolutionary Medicine" framework to study how microbial communities influence human health. This approach aims to predict and manipulate microbial influences on human health by integrating ecology, evolutionary biology, microbiology, bioinformatics and clinical expertise. We focus on the potential promise of evolutionary medicine to address three key challenges: 1) detecting microbial transmission; 2) predicting antimicrobial resistance; 3) understanding microbe-microbe and human-microbe interactions in health and disease, in the context of the microbiome.