Tan, J. ; Rodriguez-Hernaez, J. ; Sakellaropoulos, T. ; Boccalatte, F. ; Aifantis, I. ; Skok, J. ; Fenyö, D. ; Xia#, B. ; Tsirigos#, A. Cell type-specific prediction of 3D chromatin architecture. BioRxiv Submitted. Publisher's VersionAbstract

The mammalian genome is spatially organized in the nucleus to enable cell type-specific gene expression. Investigating how chromatin architecture determines this specificity remains a big challenge. Methods for measuring the 3D chromatin architecture, such as Hi-C, are costly and bears strong technical limitations, restricting their widespread application particularly when concerning genetic perturbations. In this study, we present C.Origami, a deep neural network model for predicting de novo cell type-specific chromatin architecture. By incorporating DNA sequence, CTCF binding, and chromatin accessibility profiles, C.Origami achieves accurate cell type-specific prediction. C.Origami enables in silico experiments that examine the impact of genetic perturbations on chromatin interactions, and moreover, leads to the identification of a compendium of cell type-specific regulators of 3D chromatin architecture. We expect Origami – the underlying model architecture of C.Origami – to be generalizable for future genomics studies in discovering novel regulatory mechanisms of the genome.

Xia#, B. ; Zhang, W. ; Wudzinska, A. ; Huang, E. ; Brosh, R. ; Pour, M. ; Miller, A. ; Dasen, J. S. ; Maurano, M. T. ; Kim, S. Y. ; et al. The genetic basis of tail-loss evolution in humans and apes. BioRxiv Submitted. Publisher's VersionAbstract

The loss of the tail is one of the main anatomical evolutionary changes to have occurred along the lineage leading to humans and to the “anthropomorphous apes”1,2. This morphological reprogramming in the ancestral hominoids has been long considered to have accommodated a characteristic style of locomotion and contributed to the evolution of bipedalism in humans35. Yet, the precise genetic mechanism that facilitated tail-loss evolution in hominoids remains unknown. Primate genome sequencing projects have made possible the identification of causal links between genotypic and phenotypic changes68, and enable the search for hominoid-specific genetic elements controlling tail development9. Here, we present evidence that tail-loss evolution was mediated by the insertion of an individual Alu element into the genome of the hominoid ancestor. We demonstrate that this Alu element – inserted into an intron of the TBXT gene (also called T or Brachyury1012) – pairs with a neighboring ancestral Alu element encoded in the reverse genomic orientation and leads to a hominoid-specific alternative splicing event. To study the effect of this splicing event, we generated a mouse model that mimics the expression of human TBXT products by expressing both full-length and exon-skipped isoforms of the mouse TBXT ortholog. We found that mice with this genotype exhibit the complete absence of a tail or a shortened tail, supporting the notion that the exon-skipped transcript is sufficient to induce a tail-loss phenotype, albeit with incomplete penetrance. We further noted that mice homozygous for the exon-skipped isoforms exhibited embryonic spinal cord malformations, resembling a neural tube defect condition, which affects ∼1/1000 human neonates13. We propose that selection for the loss of the tail along the hominoid lineage was associated with an adaptive cost of potential neural tube defects and that this ancient evolutionary trade-off may thus continue to affect human health today.

Xia, B. ; Yanai, I. Gene expression levels modulate germline mutation rates through the compound effects of transcription-coupled repair and damage. Hum Genet 2022, 141, 1211-1222.Abstract
Of all mammalian organs, the testis has long been observed to have the most diverse gene expression profile. To account for this widespread gene expression, we have proposed a mechanism termed 'transcriptional scanning', which reduces germline mutation rates through transcription-coupled repair (TCR). Our hypothesis contrasts with an earlier observation that mutation rates are overall positively correlated with gene expression levels in yeast, implying that transcription is mutagenic due to effects dominated by transcription-coupled damage (TCD). Here we report evidence that the compound effects of both TCR and TCD during spermatogenesis modulate human germline mutation rates, with TCR dominating in most genes, thus supporting the transcriptional scanning hypothesis. Our analyses address potentially confounding factors, distinguish the differential mutagenic effects acting on the highly expressed genes and the low-to-moderately expressed genes, and resolve concerns relating to the validation of the results using a de novo mutation dataset. We also discuss the theoretical possibility of transcriptional scanning hypothesis from an evolutionary perspective. Together, these analyses support a model by which the coupling of transcription-coupled repair and damage establishes the pattern of germline mutation rates and provide an evolutionary explanation for widespread gene expression during spermatogenesis.
Yi, C. ; Zhu, C. ; Xia, B. Method for marking 5-formyl cytosine and use thereof in single base resolution sequencing, 2020. Publisher's Version
Xia, B. ; YunYan, ; Baron, M. ; Wagner, F. ; Barkley, D. ; Chiodin, M. ; Kim, S. Y. ; Keefe, D. L. ; Alukal, J. P. ; Boeke, J. D. ; et al. Widespread Transcriptional Scanning in the Testis Modulates Gene Evolution Rates. Cell 2020, 180, 248-262.Abstract

The testis expresses the largest number of genes of any mammalian organ, a finding that has long puzzled molecular biologists. Our single-cell transcriptomic data of human and mouse spermatogenesis provide evidence that this widespread transcription maintains DNA sequence integrity in the male germline by correcting DNA damage through a mechanism we term transcriptional scanning. We find that genes expressed during spermatogenesis display lower mutation rates on the transcribed strand and have low diversity in the population. Moreover, this effect is fine-tuned by the level of gene expression during spermatogenesis. The unexpressed genes, which in our model do not benefit from transcriptional scanning, diverge faster over evolutionary timescales and are enriched for sensory and immune-defense functions. Collectively, we propose that transcriptional scanning shapes germline mutation signatures and modulates mutation rates in a gene-specific manner, maintaining DNA sequence integrity for the bulk of genes but allowing for faster evolution in a specific subset.

Xia, B. ; Yanai, I. A periodic table of cell types. Development 2019, 146.Abstract
Single cell biology is currently revolutionizing developmental and evolutionary biology, revealing new cell types and states in an impressive range of biological systems. With the accumulation of data, however, the field is grappling with a central unanswered question: what exactly is a cell type? This question is further complicated by the inherently dynamic nature of developmental processes. In this Hypothesis article, we propose that a 'periodic table of cell types' can be used as a framework for distinguishing cell types from cell states, in which the periods and groups correspond to developmental trajectories and stages along differentiation, respectively. The different states of the same cell type are further analogous to 'isotopes'. We also highlight how the concept of a periodic table of cell types could be useful for predicting new cell types and states, and for recognizing relationships between cell types throughout development and evolution.
Zeng, H. ; Mondal, M. ; Song, R. ; Zhang, J. ; Xia, B. ; Liu, M. ; Zhu, C. ; He, B. ; Gao, Y. Q. ; Yi, C. Unnatural Cytosine Bases Recognized as Thymines by DNA Polymerases by the Formation of the Watson-Crick Geometry. Angew Chem Int Ed Engl 2019, 58, 130-133.Abstract
The emergence of unnatural DNA bases provides opportunities to demystify the mechanisms by which DNA polymerases faithfully decode chemical information on the template. It was previously shown that two unnatural cytosine bases (termed "M-fC" and "I-fC"), which are chemical labeling adducts of the epigenetic base 5-formylcytosine, can induce C-to-T transition during DNA amplification. However, how DNA polymerases recognize such unnatural cytosine bases remains enigmatic. Herein, crystal structures of unnatural cytosine bases pairing to dA/dG in the KlenTaq polymerase-host-guest complex system and pairing to dATP in the KlenTaq polymerase active site were determined. Both M-fC and I-fC base pair with dA/dATP, but not with dG, in a Watson-Crick geometry. This study reveals that the formation of the Watson-Crick geometry, which may be enabled by the A-rule, is important for the recognition of unnatural cytosines.
Zeng, H. ; He, B. ; Xia, B. ; Bai, D. ; Lu, X. ; Cai, J. ; Chen, L. ; Zhou, A. ; Zhu, C. ; Meng, H. ; et al. Bisulfite-Free, Nanoscale Analysis of 5-Hydroxymethylcytosine at Single Base Resolution. J Am Chem Soc 2018, 140, 13190-13194.Abstract
High-resolution detection of genome-wide 5-hydroxymethylcytosine (5hmC) sites of small-scale samples remains challenging. Here, we present hmC-CATCH, a bisulfite-free, base-resolution method for the genome-wide detection of 5hmC. hmC-CATCH is based on selective 5hmC oxidation, chemical labeling and subsequent C-to-T transition during PCR. Requiring only nanoscale input genomic DNA samples, hmC-CATCH enabled us to detect genome-wide hydroxymethylome of human embryonic stem cells in a cost-effective manner. Further application of hmC-CATCH to cell-free DNA (cfDNA) of healthy donors and cancer patients revealed base-resolution hydroxymethylome in the human cfDNA for the first time. We anticipate that our chemical biology approach will find broad applications in hydroxymethylome analysis of limited biological and clinical samples.
Cimmino, L. ; Dolgalev, I. ; Wang, Y. ; Yoshimi, A. ; Martin, G. H. ; Wang, J. ; Ng, V. ; Xia, B. ; Witkowski, M. T. ; Mitchell-Flack, M. ; et al. Restoration of TET2 Function Blocks Aberrant Self-Renewal and Leukemia Progression. Cell 2017, 170, 1079-1095.e20.Abstract
Loss-of-function mutations in TET2 occur frequently in patients with clonal hematopoiesis, myelodysplastic syndrome (MDS), and acute myeloid leukemia (AML) and are associated with a DNA hypermethylation phenotype. To determine the role of TET2 deficiency in leukemia stem cell maintenance, we generated a reversible transgenic RNAi mouse to model restoration of endogenous Tet2 expression. Tet2 restoration reverses aberrant hematopoietic stem and progenitor cell (HSPC) self-renewal in vitro and in vivo. Treatment with vitamin C, a co-factor of Fe2+ and α-KG-dependent dioxygenases, mimics TET2 restoration by enhancing 5-hydroxymethylcytosine formation in Tet2-deficient mouse HSPCs and suppresses human leukemic colony formation and leukemia progression of primary human leukemia PDXs. Vitamin C also drives DNA hypomethylation and expression of a TET2-dependent gene signature in human leukemia cell lines. Furthermore, TET-mediated DNA oxidation induced by vitamin C treatment in leukemia cells enhances their sensitivity to PARP inhibition and could provide a safe and effective combination strategy to selectively target TET deficiency in cancer. PAPERCLIP.
Zhu*, C. ; Gao*, Y. ; Guo*, H. ; Xia*, B. ; Song, J. ; Wu, X. ; Zeng, H. ; Kee, K. ; Tang, F. ; Yi, C. Single-Cell 5-Formylcytosine Landscapes of Mammalian Early Embryos and ESCs at Single-Base Resolution. Cell Stem Cell 2017, 20, 720-731.e5.Abstract
Active DNA demethylation in mammals involves ten-eleven translocation (TET) family protein-mediated oxidation of 5-methylcytosine (5mC). However, base-resolution landscapes of 5-formylcytosine (5fC) (an oxidized derivative of 5mC) at the single-cell level remain unexplored. Here, we present "CLEVER-seq" (chemical-labeling-enabled C-to-T conversion sequencing), which is a single-cell, single-base resolution 5fC-sequencing technology, based on biocompatible, selective chemical labeling of 5fC and subsequent C-to-T conversion during amplification and sequencing. CLEVER-seq shows intrinsic 5fC heterogeneity in mouse early embryos, Epi stem cells (EpiSCs), and embryonic stem cells (ESCs). CLEVER-seq of mouse early embryos also reveals the highly patterned genomic distribution and parental-specific dynamics of 5fC during mouse early pre-implantation development. Integrated analysis demonstrates that promoter 5fC production precedes the expression upregulation of a clear set of developmentally and metabolically critical genes. Collectively, our work reveals the dynamics of active DNA demethylation during mouse pre-implantation development and provides an important resource for further functional studies of epigenetic reprogramming in single cells.
Yi, C. ; Xia, B. ; Zhou, A. 5-formylcytosine specific chemical labeling method and related applications, 2016. Publisher's Version
Peng, J. ; Xia, B. ; Yi, C. Single-base resolution analysis of DNA epigenome via high-throughput sequencing. Sci China Life Sci 2016, 59, 219-26.Abstract
Epigenetic changes caused by DNA methylation and histone modifications play important roles in the regulation of various cellular processes and development. Recent discoveries of 5-methylcytosine (5mC) oxidation derivatives including 5-hydroxymethylcytosine (5hmC), 5-formylcytsine (5fC) and 5-carboxycytosine (5caC) in mammalian genome further expand our understanding of the epigenetic regulation. Analysis of DNA modification patterns relies increasingly on sequencing-based profiling methods. A number of different approaches have been established to map the DNA epigenomes with single-base resolution, as represented by the bisulfite-based methods, such as classical bisulfite sequencing (BS-seq), TAB-seq (TET-assisted bisulfite sequencing), oxBS-seq (oxidative bisulfite sequencing) and etc. These methods have been used to generate base-resolution maps of 5mC and its oxidation derivatives in genomic samples. The focus of this review will be to discuss the chemical methodologies that have been developed to detect the cytosine derivatives in the genomic DNA.
Xia*, B. ; Han*, D. ; Lu*, X. ; Sun, Z. ; Zhou, A. ; Yin, Q. ; Zeng, H. ; Liu, M. ; Jiang, X. ; Xie, W. ; et al. Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale. Nature Methods 2015, 12, 1047-50.Abstract
Active DNA demethylation in mammals involves oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). However, genome-wide detection of 5fC at single-base resolution remains challenging. Here we present fC-CET, a bisulfite-free method for whole-genome analysis of 5fC based on selective chemical labeling of 5fC and subsequent C-to-T transition during PCR. Base-resolution 5fC maps showed limited overlap with 5hmC, with 5fC-marked regions more active than 5hmC-marked ones.
Lu, L. ; Zhu, C. ; Xia, B. ; Yi, C. Oxidative demethylation of DNA and RNA mediated by non-heme iron-dependent dioxygenases. Chem Asian J 2014, 9 2018-29.Abstract
DNA/RNA methylation can be generated by methyltransferases and thus plays a critical role in regulating cellular processes; alternatively, nucleic acid methylation can be produced by methylation agents and is cytotoxic/mutagenic if left unrepaired. Oxidative demethylation mediated by non-heme iron-dependent dioxygenases is an efficient way to reverse either the cellular roles of regulatory methylation or the cytotoxic/mutagenic effects of methylation damage. In this Focus Review we summarize recent advances in the study of nucleic acid dioxygenases exemplified by the TET and AlkB family proteins, with an emphasis on chemical insights from the recent literature. Comparison of the chemical mechanisms of these dioxygenases revealed that differences in the mechanism also contribute significantly to their distinct biological functions.