Dan Knights, Mark S Silverberg, Rinse K Weersma, Dirk Gevers, Gerard Dijkstra, Hailiang Huang, Andrea D Tyler, Suzanne van Sommeren, Floris Imhann, Joanne M Stempak, Hu Huang, Pajau Vangay, Gabriel A Al-Ghalith, Caitlin Russell, Jenny Sauk, Jo Knight, Mark J Daly, Curtis Huttenhower, and Ramnik J Xavier. 2014. “Complex host genetics influence the microbiome in inflammatory bowel disease.” Genome Med, 6, 12, Pp. 107.Abstract
BACKGROUND: Human genetics and host-associated microbial communities have been associated independently with a wide range of chronic diseases. One of the strongest associations in each case is inflammatory bowel disease (IBD), but disease risk cannot be explained fully by either factor individually. Recent findings point to interactions between host genetics and microbial exposures as important contributors to disease risk in IBD. These include evidence of the partial heritability of the gut microbiota and the conferral of gut mucosal inflammation by microbiome transplant even when the dysbiosis was initially genetically derived. Although there have been several tests for association of individual genetic loci with bacterial taxa, there has been no direct comparison of complex genome-microbiome associations in large cohorts of patients with an immunity-related disease. METHODS: We obtained 16S ribosomal RNA (rRNA) gene sequences from intestinal biopsies as well as host genotype via Immunochip in three independent cohorts totaling 474 individuals. We tested for correlation between relative abundance of bacterial taxa and number of minor alleles at known IBD risk loci, including fine mapping of multiple risk alleles in the Nucleotide-binding oligomerization domain-containing protein 2 (NOD2) gene exon. We identified host polymorphisms whose associations with bacterial taxa were conserved across two or more cohorts, and we tested related genes for enrichment of host functional pathways. RESULTS: We identified and confirmed in two cohorts a significant association between NOD2 risk allele count and increased relative abundance of Enterobacteriaceae, with directionality of the effect conserved in the third cohort. Forty-eight additional IBD-related SNPs have directionality of their associations with bacterial taxa significantly conserved across two or three cohorts, implicating genes enriched for regulation of innate immune response, the JAK-STAT cascade, and other immunity-related pathways. CONCLUSIONS: These results suggest complex interactions between genetically altered host functional pathways and the structure of the microbiome. Our findings demonstrate the ability to uncover novel associations from paired genome-microbiome data, and they suggest a complex link between host genetics and microbial dysbiosis in subjects with IBD across independent cohorts.
Ashwin N Ananthakrishnan, Hailiang Huang, Deanna D Nguyen, Jenny Sauk, Vijay Yajnik, and Ramnik J Xavier. 2014. “Differential effect of genetic burden on disease phenotypes in Crohn's disease and ulcerative colitis: analysis of a North American cohort.” Am J Gastroenterol, 109, 3, Pp. 395-400.Abstract
OBJECTIVES: Crohn's disease (CD) and ulcerative colitis (UC) are chronic immunologically mediated diseases with a progressive relapsing remitting course. There is considerable heterogeneity in disease course and accurate prediction of natural history has been challenging. The phenotypic implication of increasing genetic predisposition to CD or UC is unknown. METHODS: The data source for our study was a prospective cohort of CD and UC patients recruited from a tertiary referral center. All patients underwent genotyping on the Illumina Immunochip. A genetic risk score (GRS) incorporating strength of association (log odds ratio) and allele dose for each of the 163 inflammatory bowel disease (IBD) risk loci was calculated and phenotypic associations examined across GRS quartiles. RESULTS: Our study cohort included 1,105 patients (697 CD, 408 UC). Increasing genetic burden was associated with earlier age of diagnosis of CD (Ptrend=0.008). Patients in the highest GRS quartile were likely to develop disease 5 years earlier than those in the lowest quartile. Increasing genetic burden was also associated with ileal involvement in CD (Ptrend <0.0001). The effect of genetic burden was independent of the NOD2 locus and was stronger among those with no NOD2 variants, and in never smokers. UC patients with an involved first-degree relative had a higher genetic burden, but GRS was not associated with disease phenotype in UC. CONCLUSIONS: Increasing genetic burden is associated with early age of diagnosis in CD, but not UC. The expanded panel of IBD risk loci explains only a fraction of variance of disease phenotype, suggesting limited clinical utility of genetics in predicting natural history.
Dan E Arking, Sara L Pulit, Lia Crotti, Pim van der Harst, Patricia B Munroe, Tamara T Koopmann, Nona Sotoodehnia, Elizabeth J Rossin, Michael Morley, Xinchen Wang, Andrew D Johnson, Alicia Lundby, Daníel F Gudbjartsson, Peter A Noseworthy, Mark Eijgelsheim, Yuki Bradford, Kirill V Tarasov, Marcus Dörr, Martina Müller-Nurasyid, Annukka M Lahtinen, Ilja M Nolte, Albert Vernon Smith, Joshua C Bis, Aaron Isaacs, Stephen J Newhouse, Daniel S Evans, Wendy S Post, Daryl Waggott, Leo-Pekka Lyytikäinen, Andrew A Hicks, Lewin Eisele, David Ellinghaus, Caroline Hayward, Pau Navarro, Sheila Ulivi, Toshiko Tanaka, David J Tester, Stéphanie Chatel, Stefan Gustafsson, Meena Kumari, Richard W Morris, Åsa T Naluai, Sandosh Padmanabhan, Alexander Kluttig, Bernhard Strohmer, Andrie G Panayiotou, Maria Torres, Michael Knoflach, Jaroslav A Hubacek, Kamil Slowikowski, Soumya Raychaudhuri, Runjun D Kumar, Tamara B Harris, Lenore J Launer, Alan R Shuldiner, Alvaro Alonso, Joel S Bader, Georg Ehret, Hailiang Huang, Linda WH Kao, James B Strait, Peter W Macfarlane, Morris Brown, Mark J Caulfield, Nilesh J Samani, Florian Kronenberg, Johann Willeit, Gustav J Smith, Karin H Greiser, Henriette Meyer Zu Schwabedissen, Karl Werdan, Massimo Carella, Leopoldo Zelante, Susan R Heckbert, Bruce M Psaty, Jerome I Rotter, Ivana Kolcic, Ozren Polašek, Alan F Wright, Maura Griffin, Mark J Daly, David O Arnar, Hilma Hólm, Unnur Thorsteinsdottir, Joshua C Denny, Dan M Roden, Rebecca L Zuvich, Valur Emilsson, Andrew S Plump, Martin G Larson, Christopher J O'Donnell, Xiaoyan Yin, Marco Bobbo, Adamo P D'Adamo, Annamaria Iorio, Gianfranco Sinagra, Angel Carracedo, Steven R Cummings, Michael A Nalls, Antti Jula, Kimmo K Kontula, Annukka Marjamaa, Lasse Oikarinen, Markus Perola, Kimmo Porthan, Raimund Erbel, Per Hoffmann, Karl-Heinz Jöckel, Hagen Kälsch, Markus M Nöthen, Marcel den Hoed, Ruth JF Loos, Dag S Thelle, Christian Gieger, Thomas Meitinger, Siegfried Perz, Annette Peters, Hanna Prucha, Moritz F Sinner, Melanie Waldenberger, Rudolf A de Boer, Lude Franke, Pieter A van der Vleuten, Britt Maria Beckmann, Eimo Martens, Abdennasser Bardai, Nynke Hofman, Arthur AM Wilde, Elijah R Behr, Chrysoula Dalageorgou, John R Giudicessi, Argelia Medeiros-Domingo, Julien Barc, Florence Kyndt, Vincent Probst, Alice Ghidoni, Roberto Insolia, Robert M Hamilton, Stephen W Scherer, Jeffrey Brandimarto, Kenneth Margulies, Christine E Moravec, Fabiola Del Greco M, Christian Fuchsberger, Jeffrey R O'Connell, Wai K Lee, Graham CM Watt, Harry Campbell, Sarah H Wild, Nour E El Mokhtari, Norbert Frey, Folkert W Asselbergs, Irene Mateo Leach, Gerjan Navis, Maarten P van den Berg, Dirk J van Veldhuisen, Manolis Kellis, Bouwe P Krijthe, Oscar H Franco, Albert Hofman, Jan A Kors, André G Uitterlinden, Jacqueline CM Witteman, Lyudmyla Kedenko, Claudia Lamina, Ben A Oostra, Gonçalo R Abecasis, Edward G Lakatta, Antonella Mulas, Marco Orrú, David Schlessinger, Manuela Uda, Marcello RP Markus, Uwe Völker, Harold Snieder, Timothy D Spector, Johan Ärnlöv, Lars Lind, Johan Sundström, Ann-Christine Syvänen, Mika Kivimaki, Mika Kähönen, Nina Mononen, Olli T Raitakari, Jorma S Viikari, Vera Adamkova, Stefan Kiechl, Maria Brion, Andrew N Nicolaides, Bernhard Paulweber, Johannes Haerting, Anna F Dominiczak, Fredrik Nyberg, Peter H Whincup, Aroon D Hingorani, Jean-Jacques Schott, Connie R Bezzina, Erik Ingelsson, Luigi Ferrucci, Paolo Gasparini, James F Wilson, Igor Rudan, Andre Franke, Thomas W Mühleisen, Peter P Pramstaller, Terho J Lehtimäki, Andrew D Paterson, Afshin Parsa, Yongmei Liu, Cornelia M van Duijn, David S Siscovick, Vilmundur Gudnason, Yalda Jamshidi, Veikko Salomaa, Stephan B Felix, Serena Sanna, Marylyn D Ritchie, Bruno H Stricker, Kari Stefansson, Laurie A Boyer, Thomas P Cappola, Jesper V Olsen, Kasper Lage, Peter J Schwartz, Stefan Kääb, Aravinda Chakravarti, Michael J Ackerman, Arne Pfeufer, Paul IW de Bakker, and Christopher Newton-Cheh. 2014. “Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization.” Nat Genet, 46, 8, Pp. 826-36.Abstract
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal mendelian long-QT syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals, we identified 35 common variant loci associated with QT interval that collectively explain ∼8-10% of QT-interval variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 new QT interval-associated loci in 298 unrelated probands with LQTS identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies new candidate genes for ventricular arrhythmias, LQTS and SCD.
Yen-Ling Chiu, Liang Shan, Hailiang Huang, Carl Haupt, Catherine Bessell, David H Canaday, Hao Zhang, Ya-Chi Ho, Jonathan D Powell, Mathias Oelke, Joseph B Margolick, Joel N Blankson, Diane E Griffin, and Jonathan P Schneck. 2014. “Sprouty-2 regulates HIV-specific T cell polyfunctionality.” J Clin Invest, 124, 1, Pp. 198-208.Abstract
The ability of individual T cells to perform multiple effector functions is crucial for protective immunity against viruses and cancer. This polyfunctionality is frequently lost during chronic infections; however, the molecular mechanisms driving T cell polyfunctionality are poorly understood. We found that human T cells stimulated by a high concentration of antigen lacked polyfunctionality and expressed a transcription profile similar to that of exhausted T cells. One specific pathway implicated by the transcription profile in control of T cell polyfunctionality was the MAPK/ERK pathway. This pathway was altered in response to different antigen concentrations, and polyfunctionality correlated with upregulation of phosphorylated ERK. T cells that were stimulated with a high concentration of antigen upregulated sprouty-2 (SPRY2), a negative regulator of the MAPK/ERK pathway. The clinical relevance of SPRY2 was confirmed by examining SPRY2 expression in HIV-specific T cells, where high levels of SPRY2 were seen in HIV-specific T cells and inhibition of SPRY2 expression enhanced the HIV-specific polyfunctional response independently of the PD-1 pathway. Our findings indicate that increased SPRY2 expression during chronic viral infection reduces T cell polyfunctionality and identify SPRY2 as a potential target for immunotherapy.
Hailiang Huang, Sandeep Tata, and Robert J Prill. 2013. “BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters.” Bioinformatics, 29, 1, Pp. 135-6.Abstract
SUMMARY: Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype-phenotype datasets. AVAILABILITY AND IMPLEMENTATION:
Pritam Chanda, Hailiang Huang, Dan E Arking, and Joel S Bader. 2013. “Fast association tests for genes with FAST.” PLoS One, 8, 7, Pp. e68585.Abstract
UNLABELLED: Gene-based tests of association can increase the power of a genome-wide association study by aggregating multiple independent effects across a gene or locus into a single stronger signal. Recent gene-based tests have distinct approaches to selecting which variants to aggregate within a locus, modeling the effects of linkage disequilibrium, representing fractional allele counts from imputation, and managing permutation tests for p-values. Implementing these tests in a single, efficient framework has great practical value. Fast ASsociation Tests (Fast) addresses this need by implementing leading gene-based association tests together with conventional SNP-based univariate tests and providing a consolidated, easily interpreted report. Fast scales readily to genome-wide SNP data with millions of SNPs and tens of thousands of individuals, provides implementations that are orders of magnitude faster than original literature reports, and provides a unified framework for performing several gene based association tests concurrently and efficiently on the same data. AVAILABILITY:, with documentation at
Richard J Early, Hong Yu, Xueyan Peter Mu, Haiyan Xu, Lei Guo, Qingxue Kong, Junma Zhou, Bruce He, Xiuying Yang, Hailiang Huang, Edward Hu, and Ying Jiang. 2013. “Repeat oral dose toxicity studies of melamine in rats and monkeys.” Arch Toxicol, 87, 3, Pp. 517-27.Abstract
Melamine is an important and widely used organic industrial chemical. Recently, clinical findings of renal failure and kidney stones in infants have been associated with ingestion of melamine-contaminated infant formula. To understand the toxicity and clinical outcome of melamine exposure, repeated oral dose studies in rats and monkeys were performed to characterize the subchronic toxicity of melamine. Assessment of toxicity was based on mortality, clinical signs, body weights, ophthalmic findings, clinical pathology, gross pathology, organ weights, and microscopic observations. The first rat study was intended to be a 14-day oral study followed by an 8-day recovery period. The dose levels were 140, 700, and 1,400 mg/kg/day (lowered to 1,000 mg/kg/day subsequently due to mortality). Oral administration of melamine at 700 mg/kg/day for 14 consecutive days in rats produced compound-related clinical signs (red urine), decreased body weights, and changes in clinical pathology (increased serum urea nitrogen and creatinine) and anatomical pathology (renal tubular cell debris, crystal deposition, and hyperactive regeneration of renal tubular epithelium). The kidney was identified as the target organ. Oral administration at 1,400 mg/kg/day (subsequently lowered to 1,000 mg/kg/day) resulted in animal death and moribundity. There were no treatment-related findings in the 140 mg/kg/day group. There were no compound-related findings in the high-dose recovery animals. The second rat study was a 5-day oral toxicity study with genomic biomarkers assayed in the kidney tissues. At the top dose of 1,050 mg/kg/day, similar clinical and anatomical pathology findings as described above were observed. The genes measured, Kim-1, Clu, Spp1, A2m, Lcn2, Tcfrsf12a, Gpnmb, and CD44, were significantly up-regulated (fivefold to 550-fold), while Tff3 was significantly down-regulated (fivefold). These results indicated that genomic markers could sensitively diagnose melamine-induced kidney injury. A 3-month oral study with 4-week recovery in monkeys was also conducted. In this monkey study, the animals were treated with melamine at doses of 60, 200, or 700 mg/kg/day. The administration of 700 mg/kg/day melamine by nasal-gastric gavage to monkeys resulted in test article-related clinical signs including turbid and whitish urine, urine crystals, red blood cell changes, increased serum alanine aminotransferase and kidney and/or liver weights, and microscopic findings including nephrotoxicity, pericarditis, and increased hematopoiesis. Nephrotoxicity was also noted at 200 mg/kg/day. It was concluded that the kidney is the primary target organ and the NOAEL was estimated to be 140 mg/kg/day in rats following a 14-day oral administration and 60 mg/kg/day in the monkey study.
Hailiang Huang, Pritam Chanda, Alvaro Alonso, Joel S Bader, and Dan E Arking. 2011. “Gene-based tests of association.” PLoS Genet, 7, 7, Pp. e1002177.Abstract
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%-50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Jun Zhong, Sarah A Krawczyk, Raghothama Chaerkady, Hailiang Huang, Renu Goel, Joel S Bader, William G Wong, Barbara E Corkey, and Akhilesh Pandey. 2010. “Temporal profiling of the secretome during adipogenesis in humans.” J Proteome Res, 9, 10, Pp. 5228-38.Abstract
Adipose tissue plays a key role as a fat-storage depot and as an endocrine organ. Although mouse adipogenesis has been studied extensively, limited studies have been conducted to characterize this process in humans. We carried out a temporal proteomic analysis to interrogate the dynamic changes in the secretome of primary human preadipocytes as they differentiate into mature adipocytes. Using iTRAQ-based quantitative proteomics, we identified and quantified 420 proteins from the secretome of differentiated human adipocytes. Our results revealed that the majority of proteins showed differential expression during the course of differentiation. In addition to adipokines known to be differentially secreted in the course of adipocyte differentiation, we identified a number of proteins whose dynamic expression in this process has not been previously documented. They include collagen triple helix repeat containing 1, cytokine receptor-like factor 1, glypican-1, hepatoma-derived growth factor, SPARC related modular calcium binding protein 1, SPOCK 1, and sushi repeat-containing protein. A bioinformatics analysis using Human Protein Reference Database and Human Proteinpedia revealed that of the 420 proteins identified, 164 proteins possess signal peptides and 148 proteins are localized to the extracellular compartment. Additionally, we employed antibody arrays to quantify changes in the levels of 182 adipokines during human adipogenesis. This is the first large-scale quantitative proteomic study that combines two platforms, mass spectrometry and antibody arrays, to analyze the changes in the secretome during the course of adipogenesis in humans.
Hailiang Huang, Alexandra M Maertens, Edel M Hyland, Junbiao Dai, Anne Norris, Jef D Boeke, and Joel S Bader. 2009. “HistoneHits: a database for histone mutations and their phenotypes.” Genome Res, 19, 4, Pp. 674-81.Abstract
Histones are the basic protein components of nucleosomes. They are among the most conserved proteins and are subject to a plethora of post-translational modifications. Specific histone residues are important in establishing chromatin structure, regulating gene expression and silencing, and responding to DNA damage. Here we present HistoneHits, a database of phenotypes for systematic collections of histone mutants. This database combines assay results (phenotypes) with information about sequences, structures, post-translational modifications, and evolutionary conservation. The web interface presents the information through dynamic tables and figures. It calculates the availability of data for specific mutants and for nucleosome surfaces. The database currently includes 42 assays on 677 mutants multiply covering 405 of the 498 residues across yeast histones H3, H4, H2A, and H2B. We also provide an interface with an extensible controlled vocabulary for research groups to submit new data. Preliminary analyses confirm that mutations at highly conserved residues and modifiable residues are more likely to generate phenotypes. Buried residues and residues on the lateral surface tend to generate more phenotypes, while tail residues generate significantly fewer phenotypes than other residues. Yeast mutants are cross referenced with known human histone variants, identifying a position where a yeast mutant causes loss of ribosomal silencing and a human variant increases breast cancer susceptibility. All data sets are freely available for download.
Hailiang Huang and Joel S Bader. 2009. “Precision and recall estimates for two-hybrid screens.” Bioinformatics, 25, 3, Pp. 372-8.Abstract
MOTIVATION: Yeast two-hybrid screens are an important method to map pairwise protein interactions. This method can generate spurious interactions (false discoveries), and true interactions can be missed (false negatives). Previously, we reported a capture-recapture estimator for bait-specific precision and recall. Here, we present an improved method that better accounts for heterogeneity in bait-specific error rates. RESULT: For yeast, worm and fly screens, we estimate the overall false discovery rates (FDRs) to be 9.9%, 13.2% and 17.0% and the false negative rates (FNRs) to be 51%, 42% and 28%. Bait-specific FDRs and the estimated protein degrees are then used to identify protein categories that yield more (or fewer) false positive interactions and more (or fewer) interaction partners. While membrane proteins have been suggested to have elevated FDRs, the current analysis suggests that intrinsic membrane proteins may actually have reduced FDRs. Hydrophobicity is positively correlated with decreased error rates and fewer interaction partners. These methods will be useful for future two-hybrid screens, which could use ultra-high-throughput sequencing for deeper sampling of interacting bait-prey pairs. AVAILABILITY: All software (C source) and datasets are available as supplemental files and at under the Lesser GPL v. 3 license.
Junbiao Dai, Edel M Hyland, Daniel S Yuan, Hailiang Huang, Joel S Bader, and Jef D Boeke. 2008. “Probing nucleosome function: a highly versatile library of synthetic histone H3 and H4 mutants.” Cell, 134, 6, Pp. 1066-78.Abstract
Nucleosome structural integrity underlies the regulation of DNA metabolism and transcription. Using a synthetic approach, a versatile library of 486 systematic histone H3 and H4 substitution and deletion mutants that probes the contribution of each residue to nucleosome function was generated in Saccharomyces cerevisiae. We probed fitness contributions of each residue to perturbations of chromosome integrity and transcription, mapping global patterns of chemical sensitivities and requirements for transcriptional silencing onto the nucleosome surface. Each histone mutant was tagged with unique molecular barcodes, facilitating identification of histone mutant pools through barcode amplification, labeling, and TAG microarray hybridization. Barcodes were used to score complex phenotypes such as competitive fitness in a chemostat, DNA repair proficiency, and synthetic genetic interactions, revealing new functions for distinct histone residues and new interdependencies among nucleosome components and their modifiers.
LM Stuart, J Boulais, GM Charriere, EJ Hennessy, S Brunet, I Jutras, G Goyette, C Rondeau, S Letarte, H Huang, P Ye, F Morales, C Kocks, JS Bader, M Desjardins, and RAB Ezekowitz. 2007. “A systems biology analysis of the Drosophila phagosome.” Nature, 445, 7123, Pp. 95-101.Abstract
Phagocytes have a critical function in remodelling tissues during embryogenesis and thereafter are central effectors of immune defence. During phagocytosis, particles are internalized into 'phagosomes', organelles from which immune processes such as microbial destruction and antigen presentation are initiated. Certain pathogens have evolved mechanisms to evade the immune system and persist undetected within phagocytes, and it is therefore evident that a detailed knowledge of this process is essential to an understanding of many aspects of innate and adaptive immunity. However, despite the crucial role of phagosomes in immunity, their components and organization are not fully defined. Here we present a systems biology analysis of phagosomes isolated from cells derived from the genetically tractable model organism Drosophila melanogaster and address the complex dynamic interactions between proteins within this organelle and their involvement in particle engulfment. Proteomic analysis identified 617 proteins potentially associated with Drosophila phagosomes; these were organized by protein-protein interactions to generate the 'phagosome interactome', a detailed protein-protein interaction network of this subcellular compartment. These networks predicted both the architecture of the phagosome and putative biomodules. The contribution of each protein and complex to bacterial internalization was tested by RNA-mediated interference and identified known components of the phagocytic machinery. In addition, the prediction and validation of regulators of phagocytosis such as the 'exocyst', a macromolecular complex required for exocytosis but not previously implicated in phagocytosis, validates this strategy. In generating this 'systems-based model', we show the power of applying this approach to the study of complex cellular processes and organelles and expect that this detailed model of the phagosome will provide a new framework for studying host-pathogen interactions and innate immunity.
Hailiang Huang, Bruno M Jedynak, and Joel S Bader. 2007. “Where have all the interactions gone? Estimating the coverage of two-hybrid protein interaction maps.” PLoS Comput Biol, 3, 11, Pp. e214.Abstract
Yeast two-hybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high false-discovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic false-positive interactions) or a high false-negative rate (interaction sets have low overlap because each misses many true interactions). We extend capture-recapture theory to provide the first unified model for false-positive and false-negative rates for two-hybrid screens. Analysis of yeast, worm, and fly data indicates that 25% to 45% of the reported interactions are likely false positives. Membrane proteins have higher false-discovery rates on average, and signal transduction proteins have lower rates. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical undersampling and a 55% to 85% false-negative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the Erdös-Rényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All software and datasets are available in and , -, and -, and are also available from our Web site,