Groschel M, Owens M, Freschi L, Vargas R, Marin M, Phelan J, Iqbal Z, Dixit A, and Farhat MR. 8/30/2021. “
Gen TB: A user-friendly genome-based predictor of tuberculosis resistance powered by machine learning.” Genome Medicine.
Publisher's VersionAbstract
Abstract
Background
Multidrug-resistant Mycobacterium tuberculosis (Mtb) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practitioners to rapidly diagnose resistance and inform treatment regimens.
Results
We present Translational Genomics platform for Tuberculosis (GenTB), a free and open web-based application to predict antibiotic resistance from next-generation sequence data. The user can choose between two potential predictors, a Random Forest (RF) classifier and a Wide and Deep Neural Network (WDNN) to predict phenotypic resistance to 13 and 10 anti-tuberculosis drugs, respectively. We benchmark GenTB’s predictive performance along with leading TB resistance prediction tools (Mykrobe and TB-Profiler) using a ground truth dataset of 20,408 isolates with laboratory-based drug susceptibility data. All four tools reliably predicted resistance to first-line tuberculosis drugs but had varying performance for second-line drugs. The mean sensitivities for GenTB-RF and GenTB-WDNN across the nine shared drugs were 77.6% (95% CI 76.6–78.5%) and 75.4% (95% CI 74.5–76.4%), respectively, and marginally higher than the sensitivities of TB-Profiler at 74.4% (95% CI 73.4–75.3%) and Mykrobe at 71.9% (95% CI 70.9–72.9%). The higher sensitivities were at an expense of ≤ 1.5% lower specificity: Mykrobe 97.6% (95% CI 97.5–97.7%), TB-Profiler 96.9% (95% CI 96.7 to 97.0%), GenTB-WDNN 96.2% (95% CI 96.0 to 96.4%), and GenTB-RF 96.1% (95% CI 96.0 to 96.3%). Averaged across the four tools, genotypic resistance sensitivity was 11% and 9% lower for isoniazid and rifampicin respectively, on isolates sequenced at low depth (< 10× across 95% of the genome) emphasizing the need to quality control input sequence data before prediction. We discuss differences between tools in reporting results to the user including variants underlying the resistance calls and any novel or indeterminate variants
Conclusions
GenTB is an easy-to-use online tool to rapidly and accurately predict resistance to anti-tuberculosis drugs. GenTB can be accessed online at https://gentb.hms.harvard.edu, and the source code is available at https://github.com/farhat-lab/gentb-site.
Vargas R, Freschi L, Spitaleri A, Tahseen S, Barilar I, Neimann S, Miotto P, Cirillo D, Koser C, and Farhat MR. 8/30/2021. “
The role of epistasis in amikacin, kanamycin, bedaquiline, and clofazimine resistance in Mycobacterium tuberculosis complex.” Antimicrobial Agents and Chemotherapy.
Publisher's VersionAbstract
Abstract
Antibiotic resistance among bacterial pathogens poses a major global health threat. M. tuberculosis complex (MTBC) is estimated to have the highest resistance rates of any pathogen globally. Given the slow growth rate and the need for a biosafety level 3 laboratory, the only realistic avenue to scale up drug susceptibility testing (DST) for this pathogen is to rely on genotypic techniques. This raises the fundamental question of whether a mutation is a reliable surrogate for phenotypic resistance or whether the presence of a second mutation can completely counteract its effect, resulting in major diagnostic errors (i.e. systematic false resistance results). To date, such epistatic interactions have only been reported for streptomycin that is now rarely used. By analyzing more than 31,000 MTBC genomes, we demonstrated that the eis C-14T promoter mutation, which is interrogated by several genotypic DST assays endorsed by the World Health Organization, cannot confer resistance to amikacin and kanamycin if it coincides with loss-of-function (LoF) mutations in the coding region of eis. To our knowledge, this represents the first definitive example of antibiotic reversion in MTBC. Moreover, we raise the possibility that mmpR (Rv0678) mutations are not valid markers of resistance to bedaquiline and clofazimine if these coincide with a LoF mutation in the efflux pump encoded by mmpS5 (Rv0677c) and mmpL5 (Rv0676c).
Kaafarani H, Gaitanidis A, Farhat MR., Christensen M, Breen K, Mendoza A, Fagenholz P, and Velmahos G. 7/28/2021. “
Association Between NEDD4L Variation and the Genetic Risk of Acute Appendicitis, A Multi-institutional Genome-Wide Association Study.” JAMA Surgery.
Publisher's VersionAbstract
Importance The familial aspect of acute appendicitis (AA) has been proposed, but its hereditary basis remains undetermined.
Objective To identify genomic variants associated with AA.
Design, Setting, and Participants This genome-wide association study, conducted from June 21, 2019, to February 4, 2020, used a multi-institutional biobank to retrospectively identify patients with AA across 8 single-nucleotide variation (SNV) genotyping batches. The study also examined differential gene expression in appendiceal tissue samples between patients with AA and controls using the GSE9579 data set in the National Institutes of Health’s Gene Expression Omnibus repository. Statistical analysis was conducted from October 1, 2019, to February 4, 2020.
Main Outcomes and Measures Single-nucleotide variations with a minor allele frequency of 5% or higher were tested for association with AA using a linear mixed model. The significance threshold was set at P = 5 × 10−8.
Results A total of 29 706 patients (15 088 women [50.8%]; mean [SD] age at enrollment, 60.1 [17.0] years) were included, 1743 of whom had a history of AA. The genomic inflation factor for the cohort was 1.003. A previously unknown SNV at chromosome 18q was found to be associated with AA (rs9953918: odds ratio, 0.99; 95% CI, 0.98-1.00; P = 4.48 × 10−8). This SNV is located in an intron of the NEDD4L gene. The heritability of appendicitis was estimated at 30.1%. Gene expression data from appendiceal tissue donors identified NEDD4L to be among the most differentially expressed genes (14 of 22 216 genes; β [SE] = −2.71 [0.44]; log fold change = −1.69; adjusted P = .04).
Conclusions and Relevance This study identified SNVs within the NEDD4L gene as being associated with AA. Nedd4l is involved in the ubiquitination of intestinal ion channels and decreased Nedd4l activity may be implicated in the pathogenesis of AA. These findings can improve the understanding of the genetic predisposition to and pathogenesis of AA.
M Marin, R Vargas, M Harris, B Jeffrey, L.E Epperson, D Durbin, M Strong, M Salfinger, Z Iqbal, I Akhundova, S Vashakidze, V Crudu, A Rosenthal, and MR Farhat. 4/8/2021. “
Genomic sequence characteristics and the empiric accuracy of short-read sequencing.” bioRxiv.
Publisher's VersionAbstractBackground: Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias, reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. For the clonal pathogen Mycobacterium tuberculosis (Mtb), researchers frequently exclude 10.7% of the genome believed to be repetitive and prone to erroneous variant calls. To benchmark short-read variant calling, we used 36 diverse clinical Mtb isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically study the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content. Results: Reference based Illumina variant calling had a recall ≥89.0% and precision ≥98.5% across parameters evaluated. The best balance between precision and recall was achieved by tuning the mapping quality (MQ) threshold, i.e. confidence of the read mapping (recall 85.8%, precision 99.1% at MQ ≥ 40). Masking repetitive sequence content is an alternative conservative approach to variant calling that maintains high precision (recall 70.2%, precision 99.6% at MQ≥40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52 of the 168 PE/PPE genes (34.5%). We present a refined list of low confidence regions and examine the largest sources of variant calling error. Conclusions: Our improved approach to variant calling has broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms.