Danya J Martell, Robert Ietswaart, Brendan M Smalec, and L. Stirling Churchman. 6/23/2021. “Profiling metazoan transcription genome-wide with nucleotide resolution using NET-seq (native elongating transcript sequencing).” Publisher's VersionAbstract
Quantifying crucial steps in gene regulation during transcription elongation, such as promoter-proximal pausing, requires high resolution methods to map the transcription machinery across the genome. Native Elongating Transcript sequencing (NET-seq) interrogates the 3' ends of nascent RNA through sequencing, providing a direct visualization of RNA Polymerase II (Pol II) positions genome-wide with strand specificity and single nucleotide resolution. NET-seq applied to human cells has uncovered regions of Pol II pausing at the boundaries of retained exons and convergent antisense transcription near transcription start sites (Mayer et al. 2015). It has also been used to investigate regulators of productive elongation (Winter et al. 2017), and the directionality of promoter regions (Jin et al. 2017). Here, we describe the experimental protocol for metazoan cells that includes a spike-in control enabling normalization across samples. We also report on an improved bioinformatics pipeline for NET-seq. Together, the protocol yields a fast and non-perturbative method to map Pol II transcription genome-wide, revealing complex and global transcriptional events.
Robert Ietswaart, Benjamin M Gyori, John A Bachman, Peter K Sorger, and Stirling L Churchman. 2021. “GeneWalk identifies relevant gene functions for a biological context using network representation learning.” Genome Biol, 22, 1, Pp. 55.Abstract
A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk ( ) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.
Robert Ietswaart, Seda Arat, Amanda X Chen, Saman Farahmand, Bumjun Kim, William DuMouchel, Duncan Armstrong, Alexander Fekete, Jeffrey J Sutherland, and Laszlo Urban. 2020. “Machine learning guided association of adverse drug reactions with in vitro target-based pharmacology.” EBioMedicine, 57, Pp. 102837.Abstract
BACKGROUND: Adverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines. METHODS: Here, we analyse in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles. FINDINGS: By evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Amongst these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. INTERPRETATION: These associations provide a comprehensive resource to support drug development and human biology studies. FUNDING: This study was not supported by any formal funding bodies.
Robert Ietswaart, Stefanie Rosa, Zhe Wu, Caroline Dean, and Martin Howard. 2017. “Cell-Size-Dependent Transcription of FLC and Its Antisense Long Non-coding RNA COOLAIR Explain Cell-to-Cell Expression Variation.” Cell Syst, 4, 6, Pp. 622-635.e9.Abstract
Single-cell quantification of transcription kinetics and variability promotes a mechanistic understanding of gene regulation. Here, using single-molecule RNA fluorescence in situ hybridization and mathematical modeling, we dissect cellular RNA dynamics for Arabidopsis FLOWERING LOCUS C (FLC). FLC expression quantitatively determines flowering time and is regulated by antisense (COOLAIR) transcription. In cells without observable COOLAIR expression, we quantify FLC transcription initiation, elongation, intron processing, and lariat degradation, as well as mRNA release from the locus and degradation. In these heterogeneously sized cells, FLC mRNA number increases linearly with cell size, resulting in a large cell-to-cell variability in transcript level. This variation is accounted for by cell-size-dependent, Poissonian FLC mRNA production, but not by large transcriptional bursts. In COOLAIR-expressing cells, however, antisense transcription increases with cell size and contributes to FLC transcription decreasing with cell size. Our analysis therefore reveals an unexpected role for antisense transcription in modulating the scaling of transcription with cell size.
Zhe Wu, Robert Ietswaart, Fuquan Liu, Hongchun Yang, Martin Howard, and Caroline Dean. 2016. “Quantitative regulation of FLC via coordinated transcriptional initiation and elongation.” Proc Natl Acad Sci U S A, 113, 1, Pp. 218-23.Abstract
The basis of quantitative regulation of gene expression is still poorly understood. In Arabidopsis thaliana, quantitative variation in expression of FLOWERING LOCUS C (FLC) influences the timing of flowering. In ambient temperatures, FLC expression is quantitatively modulated by a chromatin silencing mechanism involving alternative polyadenylation of antisense transcripts. Investigation of this mechanism unexpectedly showed that RNA polymerase II (Pol II) occupancy changes at FLC did not reflect RNA fold changes. Mathematical modeling of these transcriptional dynamics predicted a tight coordination of transcriptional initiation and elongation. This prediction was validated by detailed measurements of total and chromatin-bound FLC intronic RNA, a methodology appropriate for analyzing elongation rate changes in a range of organisms. Transcription initiation was found to vary ∼ 25-fold with elongation rate varying ∼ 8- to 12-fold. Premature sense transcript termination contributed very little to expression differences. This quantitative variation in transcription was coincident with variation in H3K36me3 and H3K4me2 over the FLC gene body. We propose different chromatin states coordinately influence transcriptional initiation and elongation rates and that this coordination is likely to be a general feature of quantitative gene regulation in a chromatin context.
Robert Ietswaart, Florian Szardenings, Kenn Gerdes, and Martin Howard. 2014. “Competing ParA structures space bacterial plasmids equally over the nucleoid.” PLoS Comput Biol, 10, 12, Pp. e1004009.Abstract
Low copy number plasmids in bacteria require segregation for stable inheritance through cell division. This is often achieved by a parABC locus, comprising an ATPase ParA, DNA-binding protein ParB and a parC region, encoding ParB-binding sites. These minimal components space plasmids equally over the nucleoid, yet the underlying mechanism is not understood. Here we investigate a model where ParA-ATP can dynamically associate to the nucleoid and is hydrolyzed by plasmid-associated ParB, thereby creating nucleoid-bound, self-organizing ParA concentration gradients. We show mathematically that differences between competing ParA concentrations on either side of a plasmid can specify regular plasmid positioning. Such positioning can be achieved regardless of the exact mechanism of plasmid movement, including plasmid diffusion with ParA-mediated immobilization or directed plasmid motion induced by ParB/parC-stimulated ParA structure disassembly. However, we find experimentally that parABC from Escherichia coli plasmid pB171 increases plasmid mobility, inconsistent with diffusion/immobilization. Instead our observations favor directed plasmid motion. Our model predicts less oscillatory ParA dynamics than previously believed, a prediction we verify experimentally. We also show that ParA localization and plasmid positioning depend on the underlying nucleoid morphology, indicating that the chromosomal architecture constrains ParA structure formation. Our directed motion model unifies previously contradictory models for plasmid segregation and provides a robust mechanistic basis for self-organized plasmid spacing that may be widely applicable.
Robert Ietswaart, Zhe Wu, and Caroline Dean. 2012. “Flowering time control: another window to the connection between antisense RNA and chromatin.” Trends Genet, 28, 9, Pp. 445-53.Abstract
A high proportion of all eukaryotic genes express antisense RNA (asRNA), which accumulates to varying degrees at different loci. Whether there is a general function for asRNA is unknown, but its widespread occurrence and frequent regulation by stress suggest an important role. The best-characterized plant gene exhibiting a complex antisense transcript pattern is the Arabidopsis floral regulator FLOWERING LOCUS C (FLC). Changes occur in the accumulation, splicing, and polyadenylation of this antisense transcript, termed COOLAIR, in different environments and genotypes. These changes are associated with altered chromatin regulation and differential FLC expression, provoking mechanistic comparisons with many well-studied loci in yeast and mammals. Detailed analysis of these specific examples may shed light on the complex interplay between asRNA and chromatin modifications in different genomes.