Publications

2013
Zhou X, Qiu W, Sathirapongsasuti FJ, Cho MH, Mancini JD, Lao T, Thibault DM, Litonjua AA, Bakke PS, Gulsvik A, et al. Gene expression analysis uncovers novel hedgehog interacting protein (HHIP) effects in human bronchial epithelial cells. Genomics. 2013;101 (5) :263-72.Abstract
Hedgehog interacting protein (HHIP) was implicated in chronic obstructive pulmonary disease (COPD) by genome-wide association studies (GWAS). However, it remains unclear how HHIP contributes to COPD pathogenesis. To identify genes regulated by HHIP, we performed gene expression microarray analysis in a human bronchial epithelial cell line (Beas-2B) stably infected with HHIP shRNAs. HHIP silencing led to differential expression of 296 genes; enrichment for variants nominally associated with COPD was found. Eighteen of the differentially expressed genes were validated by real-time PCR in Beas-2B cells. Seven of 11 validated genes tested in human COPD and control lung tissues demonstrated significant gene expression differences. Functional annotation indicated enrichment for extracellular matrix and cell growth genes. Network modeling demonstrated that the extracellular matrix and cell proliferation genes influenced by HHIP tended to be interconnected. Thus, we identified potential HHIP targets in human bronchial epithelial cells that may contribute to COPD pathogenesis.
2012
A framework for human microbiome research. Nature. 2012;486 (7402) :215-21.Abstract
A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies.
Faust K*, Sathirapongsasuti FJ*, Izard J, Segata N, Gevers D, Raes J, Huttenhower C. Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol. 2012;8 (7) :e1002606.Abstract
The healthy microbiota show remarkable variability within and among individuals. In addition to external exposures, ecological relationships (both oppositional and symbiotic) between microbial inhabitants are important contributors to this variation. It is thus of interest to assess what relationships might exist among microbes and determine their underlying reasons. The initial Human Microbiome Project (HMP) cohort, comprising 239 individuals and 18 different microbial habitats, provides an unprecedented resource to detect, catalog, and analyze such relationships. Here, we applied an ensemble method based on multiple similarity measures in combination with generalized boosted linear models (GBLMs) to taxonomic marker (16S rRNA gene) profiles of this cohort, resulting in a global network of 3,005 significant co-occurrence and co-exclusion relationships between 197 clades occurring throughout the human microbiome. This network revealed strong niche specialization, with most microbial associations occurring within body sites and a number of accompanying inter-body site relationships. Microbial communities within the oropharynx grouped into three distinct habitats, which themselves showed no direct influence on the composition of the gut microbiota. Conversely, niches such as the vagina demonstrated little to no decomposition into region-specific interactions. Diverse mechanisms underlay individual interactions, with some such as the co-exclusion of Porphyromonaceae family members and Streptococcus in the subgingival plaque supported by known biochemical dependencies. These differences varied among broad phylogenetic groups as well, with the Bacilli and Fusobacteria, for example, both enriched for exclusion of taxa from other clades. Comparing phylogenetic versus functional similarities among bacteria, we show that dominant commensal taxa (such as Prevotellaceae and Bacteroides in the gut) often compete, while potential pathogens (e.g. Treponema and Prevotella in the dental plaque) are more likely to co-occur in complementary niches. This approach thus serves to open new opportunities for future targeted mechanistic studies of the microbial ecology of the human microbiome.
Structure, function and diversity of the healthy human microbiome. Nature. 2012;486 (7402) :207-14.Abstract
Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat's signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81-99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.
2011
Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011;27 (19) :2648-54.Abstract
MOTIVATION: The ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data extends the utility of this powerful approach that has mainly been used for point or small insertion/deletion detection. RESULTS: We present ExomeCNV, a statistical method to detect CNV and LOH using depth-of-coverage and B-allele frequencies, from mapped short sequence reads, and we assess both the method's power and the effects of confounding variables. We apply our method to a cancer exome resequencing dataset. As expected, accuracy and resolution are dependent on depth-of-coverage and capture probe design. AVAILABILITY: CRAN package 'ExomeCNV'. CONTACT: fsathira@fas.harvard.edu; snelson@ucla.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Lin CM, Sathirapongsasuti FJ, Kerner B. Propensity score analysis in the Genetic Analysis Workshop 17 simulated data set on independent individuals. BMC Proc. 2011;5 Suppl 9 :S71.Abstract
Genetic Analysis Workshop 17 provided simulated phenotypes and exome sequence data for 697 independent individuals (209 case subjects and 488 control subjects). The disease liability in these data was influenced by multiple quantitative traits. We addressed the lack of statistical power in this small data set by limiting the genomic variants included in the study to those with potential disease-causing effect, thereby reducing the problem of multiple testing. After this adjustment, we could readily detect two common variants that were strongly associated with the quantitative trait Q1 (C13S523 and C13S522). However, we found no significant associations with the affected status or with any of the other quantitative traits, and the relationship between disease status and genomic variants remained obscure. To address the challenge of the multivariate phenotype, we used propensity scores to combine covariates with genetic risk factors into a single risk factor and created a new phenotype variable, the probability of being affected given the covariates. Using the propensity score as a quantitative trait in the case-control analysis, we again could identify the two common single-nucleotide polymorphisms (C13S523 and C13S522). In addition, this analysis captured the correlation between Q1 and the affected status and reduced the problem of multiple testing. Although the propensity score was useful for capturing and clarifying the genetic contributions of common variants to the disease phenotype and the mediating role of the quantitative trait Q1, the analysis did not increase power to detect rare variants.
Sathirapongsasuti FJ, Sathira N, Suzuki Y, Huttenhower C, Sugano S. Ultraconserved cDNA segments in the human transcriptome exhibit resistance to folding and implicate function in translation and alternative splicing. Nucleic Acids Res. 2011;39 (6) :1967-79.Abstract
Ultraconservation, defined as perfect human-to-rodent sequence identity at least 200-bp long, is a strong indicator of evolutionary and functional importance and has been explored extensively at the genome level. However, it has not been investigated at the transcript level, where such extreme conservation might highlight loci with important post-transcriptional regulatory roles. We present 96 ultraconserved cDNA segments (UCSs), stretches of human mature mRNAs that match identically with orthologous regions in the mouse and rat genomes. UCSs can span multiple exons, a feature we leverage here to elucidate the role of ultraconservation in post-transcriptional regulation. UCS sites are implicated in functions at essentially every post-transcriptional stage: pre-mRNA splicing and degradation through alternative splicing and nonsense-mediated decay (AS-NMD), mature mRNA silencing by miRNA, fast mRNA decay rate and translational repression by upstream AUGs. We also found UCSs to exhibit resistance to formation of RNA secondary structure. These multiple layers of regulation underscore the importance of the UCS-containing genes as key global RNA processing regulators, including members of the serine/arginine-rich protein and heterogeneous nuclear ribonucleoprotein (hnRNP) families of essential splicing regulators. The discovery of UCSs shed new light on the multifaceted, fine-tuned and tight post-transcriptional regulation of gene families as conserved through the majority of the mammalian lineage.