%0 Journal Article %J Bioinformatics %D 2023 %T Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings %A Hoyt, Charles Tapley %A Amelia Hoyt %A Benjamin M. Gyori %B Bioinformatics %P btad130 %8 2022/12/02 %G eng %U https://doi.org/10.1093/bioinformatics/btad130 %0 Journal Article %J Bioinformatics %D 2023 %T NDEx IQuery: a multi-method network gene set analysis leveraging the Network Data Exchange %A Rudolf T Pillich %A Jing Chen %A Christopher Churras %A Dylan Fong %A Ideker, Trey %A Sophie N Liu %A Gyori, Benjamin M %A Karis, Klas %A Keiichiro Ono %A Pico, Alexander %A Dexter Pratt %B Bioinformatics %P btad118 %G eng %U https://doi.org/10.1093/bioinformatics/btad118 %0 Journal Article %J Molecular Systems Biology %D 2023 %T Automated assembly of molecular mechanisms at scale from text mining and curated databases %A John A. Bachman %A Gyori, Benjamin M %A Peter K. Sorger %X The analysis of omic data depends on machine-readable information about protein interactions, modifications, and activities as found in protein interaction networks, databases of post-translational modifications, and curated models of gene and protein function. These resources typically depend heavily on human curation. Natural language processing systems that read the primary literature have the potential to substantially extend knowledge resources while reducing the burden on human curators. However, machine-reading systems are limited by high error rates and commonly generate fragmentary and redundant information. Here, we describe an approach to precisely assemble molecular mechanisms at scale using multiple natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA identifies full and partial overlaps in information extracted from published papers and pathway databases, uses predictive models to improve the reliability of machine reading, and thereby assembles individual pieces of information into non-redundant and broadly usable mechanistic knowledge. Using INDRA to create high-quality corpora of causal knowledge we show it is possible to extend protein–protein interaction databases and explain co-dependencies in the Cancer Dependency Map. %B Molecular Systems Biology %I Cold Spring Harbor Laboratory %P e11325 %G eng %U https://doi.org/10.15252/msb.202211325 %R 10.1101/2022.08.30.505688 %0 Journal Article %J arXiv preprint %D 2023 %T Democratising Knowledge Representation with BioCypher %A Lobentanzer, Sebastian %A Aloy, Patrick %A Baumbach, Jan %A Bohar, Balazs %A Charoentong, Pornpimol %A Danhauser, Katharina %A Doğan, Tunca %A Dreo, Johann %A Dunham, Ian %A Fernandez-Torras, Adrià %A Benjamin M. Gyori %A Hartung, Michael %A Hoyt, Charles Tapley %A Klein, Christoph %A Korcsmaros, Tamas %A Andreas Maier %A Mann, Matthias %A Ochoa, David %A Pareja-Lorente, Elena %A Popp, Ferdinand %A Preusse, Martin %A Probul, Niklas %A Schwikowski, Benno %A Sen, Bünyamin %A Strauss, Maximilian T. %A Turei, Denes %A Ulusoy, Erva %A Wodke, Judith Andrea Heidrun %A Saez-Rodriguez, Julio %K FOS: Biological sciences %K Molecular Networks (q-bio.MN) %B arXiv preprint %I arXiv %G eng %U https://arxiv.org/abs/2212.13543 %R 10.48550/ARXIV.2212.13543 %0 Journal Article %J bioRxiv %D 2023 %T Nociceptor neuroimmune interactomes reveal cell type- and injury-specific inflammatory pain pathways %A Jain, Aakanksha %A Benjamin M. Gyori %A Hakim, Sara %A Bunga, Samuel %A Taub, Daniel G %A Ruiz-Cantero, Mari Carmen %A Tong-Li, Candace %A Andrews, Nicholas %A Sorger, Peter K %A Woolf, Clifford J %B bioRxiv %I Cold Spring Harbor Laboratory %G eng %U https://www.biorxiv.org/content/early/2023/02/03/2023.02.01.526526 %R 10.1101/2023.02.01.526526 %0 Journal Article %J Scientific Data %D 2022 %T Unifying the Identification of Biomedical Entities with the Bioregistry %A Hoyt, Charles Tapley %A Meghan Balk %A Callahan, Tiffany J %A Domingo-Fernández, Daniel %A Melissa A. Haendel %A Harshad B. Hegde %A Daniel S. Himmelstein %A Karis, Klas %A John Kunze %A Tiago Lubiana %A Nicolas Matentzoglu %A Julie McMurry %A Sierra Moxon %A Christopher J. Mungall %A Adriano Rutz %A Deepak R. Unni %A Egon Willighagen %A Donald Winston %A Benjamin M. Gyori %B Scientific Data %V 9 %P 714 %8 July 2022 %G eng %U https://www.nature.com/articles/s41597-022-01807-3 %N 1 %0 Journal Article %J eLife %D 2022 %T Integrating multi-omics data reveals function and therapeutic potential of deubiquitinating enzymes %A Laura M. Doherty %A Caitlin E.Mills %A Sarah A. Boswell %A Xiaoxi Liu %A Hoyt, Charles Tapley %A Benjamin M. Gyori %A Sara J. Buhrlage %A Peter K. Sorger %B eLife %I Cold Spring Harbor Laboratory %V 11 %G eng %U https://doi.org/10.7554/eLife.72879 %N e72879 %R 10.1101/2021.08.06.455458 %0 Journal Article %J Bioinformatics Advances %D 2022 %T Gilda: biomedical entity text normalization with machine-learned disambiguation as a service %A Benjamin M. Gyori %A Hoyt, Charles Tapley %A Steppi, Albert %X Summary Gilda is a software tool and web service which implements a scored string matching algorithm for names and synonyms across entries in biomedical ontologies covering genes, proteins (and their families and complexes), small molecules, biological processes and diseases. Gilda integrates machine-learned disambiguation models to choose between ambiguous strings given relevant surrounding text as context, and supports species-prioritization in case of ambiguity.Availability The Gilda web service is available at http://grounding.indra.bio with source code, documentation and tutorials are available via https://github.com/indralab/gilda.Contact benjamin_gyoriathms.harvard.eduCompeting Interest StatementThe authors have declared no competing interest. %B Bioinformatics Advances %I Cold Spring Harbor Laboratory %G eng %U https://doi.org/10.1093/bioadv/vbac034 %N vbac034 %R 10.1101/2021.09.10.459803 %0 Journal Article %J Environmental Health Perspectives %D 2022 %T Automated Network Assembly of Mechanistic Literature for Informed Evidence Identification to Support Cancer Risk Assessment %A Bernice Scholten %A Laura Guerrero Simón %A Shaji Krishnan %A Vermeulen, Roel %A Anjoeka Pronk %A Benjamin M. Gyori %A John A. Bachman %A Jelle Vlaanderen %A Rob Stierum %B Environmental Health Perspectives %V 130 %P 037002 %G eng %U https://doi.org/10.1289/EHP9112 %N 3 %0 Journal Article %J Journal of Open Source Software %D 2022 %T PyBioPAX: biological pathway exchange in Python %A Benjamin M. Gyori %A Hoyt, Charles Tapley %B Journal of Open Source Software %I The Open Journal %V 7 %P 4136 %G eng %U https://doi.org/10.21105/joss.04136 %N 71 %R 10.21105/joss.04136 %0 Journal Article %J Database %D 2022 %T A roadmap for the functional annotation of protein families: a community perspective %A de Crécy-lagard, Valérie %A Amorin de Hegedus, Rocio %A Arighi, Cecilia %A Babor, Jill %A Bateman, Alex %A Blaby, Ian %A Blaby-Haas, Crysten %A Bridge, Alan J %A Burley, Stephen K %A Cleveland, Stacey %A Colwell, Lucy J %A Conesa, Ana %A Christian Dallago %A Danchin, Antoine %A de Waard, Anita %A Deutschbauer, Adam %A Dias, Raquel %A Ding, Yousong %A Fang, Gang %A Friedberg, Iddo %A Gerlt, John %A Goldford, Joshua %A Gorelik, Mark %A Gyori, Benjamin M %A Henry, Christopher %A Hutinet, Geoffrey %A Jaroch, Marshall %A Karp, Peter D %A Kondratova, Liudmyla %A Lu, Zhiyong %A Marchler-Bauer, Aron %A Martin, Maria-Jesus %A McWhite, Claire %A Moghe, Gaurav D %A Monaghan, Paul %A Morgat, Anne %A Mungall, Christopher J %A Natale, Darren A %A Nelson, William C %A O’Donoghue, Seán %A Orengo, Christine %A O’Toole, Katherine H %A Radivojac, Predrag %A Reed, Colbie %A Roberts, Richard J %A Rodionov, Dmitri %A Rodionova, Irina A %A Rudolf, Jeffrey D %A Saleh, Lana %A Sheynkman, Gloria %A Thibaud-Nissen, Francoise %A Thomas, Paul D %A Uetz, Peter %A Vallenet, David %A Carter, Erica Watson %A Weigele, Peter R %A Wood, Valerie %A Wood-Charlson, Elisha M %A Xu, Jin %X Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward. %B Database %V 2022 %8 08 %G eng %U https://doi.org/10.1093/database/baac062 %R 10.1093/database/baac062 %0 Conference Paper %B NAACL HCI+NLP %D 2022 %T Taxonomy Builder: a Data-driven and User-centric Tool for Streamlining Taxonomy Construction %A John Hungerford %A Yee Seng Chan %A Jessica MacBride %A Benjamin M. Gyori %A Andrew Zupon %A Zheng Tang %A Egoitz Laparra %A Haoling Qiu %A Bonan Min %A Yan Zverev %A Caitlin Hilverman %A Max Thomas %A Walt Andrews %A Keith Alcock %A Zeyu Zhang %A Reynolds, Michael %A Mihai Surdeanu %A Steve Bethard %A Rebecca Sharp %B NAACL HCI+NLP %C Seattle, Washington %G eng %0 Generic %D 2022 %T A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs %A Hoyt, Charles Tapley %A Berrendorf, Max %A Galkin, Mikhail %A Tresp, Volker %A Benjamin M. Gyori %K Artificial Intelligence (cs.AI) %K FOS: Computer and information sciences %K Machine Learning (cs.LG) %I arXiv %G eng %U https://arxiv.org/abs/2203.07544 %R 10.48550/ARXIV.2203.07544 %0 Journal Article %J bioRxiv %D 2022 %T Assembling a phosphoproteomic knowledge base using ProtMapper to normalize phosphosite information from databases and text mining %A John A. Bachman %A Peter K. Sorger %A Benjamin M. Gyori %B bioRxiv %I Cold Spring Harbor Laboratory %G eng %U https://www.biorxiv.org/content/10.1101/822668v4 %R 10.1101/822668 %0 Conference Paper %B KDD 2022 %D 2022 %T ChemicalX: A Deep Learning Library for Drug Pair Scoring %A Rozemberczki, Benedek %A Hoyt, Charles Tapley %A Gogleva, Anna %A Grabowski, Piotr %A Karis, Klas %A Lamov, Andrej %A Nikolov, Andriy %A Nilsson, Sebastian %A Ughetto, Michael %A Yu Wang %A Derr, Tyler %A Gyori, Benjamin M %K Artificial Intelligence (cs.AI) %K FOS: Computer and information sciences %K Machine Learning (cs.LG) %B KDD 2022 %G eng %U https://doi.org/10.1145/3534678.3539023 %R 10.48550/ARXIV.2202.05240 %0 Conference Paper %B SWAT4HCLS 2022 %D 2022 %T ProtSTonKGs: A Sophisticated Transformer Trained on Protein Sequences, Text, and Knowledge Graphs %A Balabin, Helena %A Hoyt, Charles Tapley %A Benjamin M. Gyori %A John Bachman %A Kodamullil, Alpha Tom %A Martin Hofmann-Apitius %A Daniel Domingo-Fern\´andez %X While most approaches individually exploit unstructured data from the biomedical literature or structured data from biomedical knowledge graphs, their union can better exploit the advantages of such approaches, ultimately improving representations of biology. Using multimodal transformers for such purposes can improve performance on context dependent classication tasks, as demonstrated by our previous model, the Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs (STonKGs). In this work, we introduce ProtSTonKGs, a transformer aimed at learning all-encompassing representations of protein-protein interactions. ProtSTonKGs presents an extension to our previous work by adding textual protein descriptions and amino acid sequences (i.e., structural information) to the text- and knowledge graph-based input sequence used in STonKGs. We benchmark ProtSTonKGs against STonKGs, resulting in improved F1 scores by up to 0.066 (i.e., from 0.204 to 0.270) in several tasks such as predicting protein interactions in several contexts. Our work demonstrates how multimodal transformers can be used to integrate heterogeneous sources of information, paving the foundation for future approaches that use multiple modalities for biomedical applications. %B SWAT4HCLS 2022 %P 103 – 107 %G eng %U https://nbn-resolving.org/urn:nbn:de:hbz:1044-opus-62113 %0 Journal Article %J Database %D 2022 %T A Simple Standard for Sharing Ontological Mappings (SSSOM) %A Nicolas Matentzoglu %A James P. Balhoff %A Susan M. Bello %A Chris Bizon %A Matthew H. Brush %A Tiffany J. Callahan %A Chute, Christopher G. %A William D. Duncan %A Chris T. A. Evelo %A Gabriel, Davera %A John Graybeal %A Alasdair J. G. Gray %A Benjamin M. Gyori %A Melissa A. Haendel %A Henriette Harmse %A Nomi L. Harris %A Ian Harrow %A Harshad Hegde %A Amelia L. Hoyt %A Hoyt, Charles Tapley %A Jiao, Dazhi %A Ernesto Jiménez-Ruiz %A Simon Jupp %A Hyeongsik Kim %A Sebastian Köhler %A Thomas Liener %A Qinqin Long %A Malone, James %A James A. McLaughlin %A Julie A. McMurry %A Sierra A. T. Moxon %A Monica C. Munoz-Torres %A David Osumi-Sutherland %A James A. Overton %A Peters, Bjoern %A Tim E. Putman %A Núria Queralt-Rosinach %A Kent A. Shefchek %A Solbrig, Harold %A Anne E. Thessen %A Tania Tudorache %A Nicole A. Vasilevsky %A Alex H. Wagner %A Christopher J. Mungall %B Database %V 2022 %G eng %U https://doi.org/10.1093/database/baac035 %N baac035 %0 Journal Article %J Bioinformatics %D 2022 %T STonKGs: A Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs %A Balabin, Helena %A Hoyt, Charles Tapley %A Birkenbihl, Colin %A Gyori, Benjamin M %A John Bachman %A Kodamullil, Alpha Tom %A Plöger, Paul G %A Martin Hofmann-Apitius %A Domingo-Fernández, Daniel %X The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models (KGEMs). However, representations based on a single modality are inherently limited. To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs. This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler (INDRA) consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against two baseline models trained on either one of the modalities (i.e., text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.083. Additionally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. Finally, the source code and pre-trained STonKGs models are available at https://github.com/stonkgs/stonkgs and https://huggingface.co/stonkgs/stonkgs-150k.Competing Interest StatementDaniel Domingo-Fernandez received salary from Enveda Biosciences. %B Bioinformatics %I Cold Spring Harbor Laboratory %G eng %U https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac001/6497782 %R 10.1101/2021.08.17.456616 %0 Journal Article %J bioRxiv %D 2022 %T A versatile and interoperable computational framework for the analysis and modeling of COVID-19 disease mechanisms %A Anna Niarakis %A Marek Ostaszewski %A Mazein, Alexander %A ... %A Gyori, Benjamin M %A ... %A Schneider, Reinhard %A the COVID-19 Disease Map Community %X The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdisciplinary, the project is collaborative and supports community standards, open access, and the FAIR data principles. The coordination of community work allowed for an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework links key molecules highlighted from broad omics data analysis and computational modeling to dysregulated pathways in a cell-, tissue- or patient-specific manner. We also employ text mining and AI-assisted analysis to identify potential drugs and drug targets and use topological analysis to reveal interesting structural features of the map. The proposed framework is versatile and expandable, offering a significant upgrade in the arsenal used to understand virus-host interactions and other complex pathologies.Competing Interest StatementA. Niarakis collaborates with SANOFI-AVENTIS R&D via a public private partnership grant (CIFRE contract, no 2020/0766). D. Maier and A. Bauch are employed at Biomax Informatics AG and will be affected by any effect of this publication on the commercial version of the AILANI software. J.A. Bachman and B. Gyori received consulting fees from Two Six Labs, LLC. T. Helikar has served as a shareholder and has consulted for Discovery Collective, Inc. R. Balling and R. Schneider are founders and shareholders of MEGENO S.A. and ITTM S.A. J. Saez-Rodriguez receives funding from GSK and Sanofi and consultant fees from Travere Therapeutics. Janet Pinero and Laura I. Furlong are employees and shareholders of MedBioinformatics Solutions SL. The remaining authors have declared that they have no Conflict of interest. %B bioRxiv %I Cold Spring Harbor Laboratory %G eng %U https://www.biorxiv.org/content/early/2022/12/19/2022.12.17.520865 %R 10.1101/2022.12.17.520865 %0 Journal Article %J eLife %D 2021 %T Author-sourced capture of pathway knowledge in computable form using Biofactoid %A Jeffrey Wong %A Max Franz %A Metin Can Siper %A Dylan Fong %A Funda Durupinar %A Christian Dallago %A Augustin Luna %A John M. Giorgi %A Igor Rodchenkov %A Özgün Babur %A John A. Bachman %A Benjamin M. Gyori %A Emek Demir %A Gary Bader %A Sander, Chris %B eLife %V 10 %P e68292 %G eng %U https://elifesciences.org/articles/68292 %0 Conference Paper %B Proceedings of the BioCreative VII Challenge Evaluation Workshop %D 2021 %T A self-updating causal model of COVID-19 mechanisms built from the scientific literature %A Benjamin M. Gyori %A Bachman, John A %A Diana Kolusheva %B Proceedings of the BioCreative VII Challenge Evaluation Workshop %G eng %U https://biocreative.bioinformatics.udel.edu/media/store/files/2021/Track4_pos_5_BC7_submission_195-3.pdf %0 Journal Article %J Molecular Systems Biology %D 2021 %T COVID-19 Disease Map, a computational knowledge repository of SARS-CoV-2 virus-host interaction mechanisms %A Marek Ostaszewski %A Anna Niarakis %A ... %A Gyori, Benjamin M %A ... %A Schneider, Reinhard %B Molecular Systems Biology %V 17 %P e10387 %G eng %U https://doi.org/10.1101/2020.10.26.356014 %N 10 %0 Journal Article %J Current Opinion in Systems Biology %D 2021 %T From knowledge to models: Automated modeling in systems and synthetic biology %A Benjamin M. Gyori %A John A. Bachman %K Automated modeling %K Dynamical modeling %K modeling %K Rule-based modeling %K synthetic biology %K Systems Biology %K Text mining %X Building computational models of biological mechanisms involves collecting and synthesizing knowledge about the underlying system and encoding it in an appropriate mathematical form. While this process typically requires substantial manual effort from human experts, key aspects of the modeling process are increasingly being automated or augmented by software tools, allowing for the efficient creation of large models or model ensembles. In this review, we introduce a framework for discussing modeling automation by positioning recent work into three ‘levels’, with the human and the machine taking on different responsibilities at each level. We outline the strengths and weaknesses of current modeling approaches at the different levels and discuss the prospect of fully automated fit-to-purpose modeling of biological systems. %B Current Opinion in Systems Biology %V 28 %P 100362 %G eng %U https://www.sciencedirect.com/science/article/pii/S2452310021000561 %R https://doi.org/10.1016/j.coisb.2021.100362 %0 Journal Article %J Genome Biology %D 2021 %T GeneWalk identifies relevant gene functions for a biological context using network representation learning %A Ietswaart, Robert %A Benjamin M. Gyori %A John A. Bachman %A Peter K. Sorger %A L. Stirling Churchman %X

A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk (github.com/churchmanlab/genewalk) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.

%B Genome Biology %I Cold Spring Harbor Laboratory %V 22 %G eng %U https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02264-8 %N 55 %R 10.1101/755579 %0 Journal Article %J Journal of Open Source Software %D 2020 %T Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature %A Steppi, Albert %A Benjamin Gyori %A John Bachman %B Journal of Open Source Software %I The Open Journal %V 5 %P 1708 %G eng %U https://doi.org/10.21105/joss.01708 %N 45 %R 10.21105/joss.01708 %0 Journal Article %J bioRxiv %D 2020 %T Exploring the understudied human kinome for research and therapeutic opportunities %A Moret, Nienke %A Liu, Changchang %A Benjamin M. Gyori %A John A. Bachman %A Steppi, Albert %A Taujale, Rahil %A Huang, Liang-Chin %A Hug, Clemens %A Berginski, Matt %A Gomez, Shawn %A Kannan, Natarajan %A Peter K. Sorger %X The functions of protein kinases have been heavily studied and inhibitors for many human kinases have been developed into FDA-approved therapeutics. A substantial fraction of the human kinome is nonetheless understudied. In this paper, members of the NIH Understudied Kinome Consortium mine public data on “dark” kinases to estimate the likelihood that they are functional. We start with a re-analysis of the human kinome and describe the criteria for creation of an inclusive set of 710 kinase domains and a curated set of 557 protein kinase like (PKL) domains. Nearly all PKLs are expressed in one or more CCLE cell lines and a substantial number are also essential in the Cancer Dependency Map. Dark kinases are frequently differentially expressed or mutated in The Cancer Genome Atlas and other disease databases and investigational and approved kinase inhibitors appear to inhibit them as off-target activities. Thus, it seems likely that the dark human kinome contains multiple biologically important genes, a subset of which may be viable drug targets. %B bioRxiv %I Cold Spring Harbor Laboratory %G eng %U https://www.biorxiv.org/content/early/2020/04/02/2020.04.02.022277 %R 10.1101/2020.04.02.022277 %0 Journal Article %J PLOS Computational Biology %D 2020 %T Robustness and parameter geography in post-translational modification systems %A Nam, Kee-Myoung %A Benjamin M. Gyori %A Amethyst, Silviana V. %A Bates, Daniel J. %A Jeremy Gunawardena %X Author summary Biological organisms are often said to have robust properties but it is difficult to understand how such robustness arises from molecular interactions. Here, we use a mathematical model to study how the molecular mechanism of protein modification exhibits the property of multiple internal states, which has been suggested to underlie memory and decision making. The robustness of this property is revealed by the size and shape, or “geography,” of the parametric region in which the property holds. We use advances in reducing model complexity and in rapidly solving the underlying equations, to extensively sample parameter points in an 8-dimensional space. We find that under realistic molecular assumptions the size of the region is surprisingly small, suggesting that generating multiple internal states with such a mechanism is much harder than expected. While the shape of the region appears straightforward, we find surprising complexity in how the region grows with increasing amounts of the modified substrate. Our approach uses statistical analysis of data generated from a model, rather than from experiments, but leads to precise mathematical conjectures about parameter geography and biological robustness. %B PLOS Computational Biology %I Public Library of Science %V 16 %P 1-50 %8 05 %G eng %U https://doi.org/10.1371/journal.pcbi.1007573 %N 5 %R 10.1371/journal.pcbi.1007573 %0 Conference Paper %B NAACL %D 2019 %T Eidos, INDRA & Delphi: From Free Text to Executable Causal Models %A Rebecca Sharp %A Adarsh Pyarelal %A Benjamin M. Gyori %A et al. %X Building causal models of complicated phenomena such as food insecurity is currently a slow and labor-intensive manual process. In this paper, we introduce an approach that builds executable probabilistic models from raw, free text. The proposed approach is implemented through three systems: Eidos, INDRA, and Delphi. Eidos is an open-domain machine reading system designed to extract causal relations from natural language. It is rule-based, allowing for rapid domain transfer, customizability, and interpretability. INDRA aggregates multiple sources of causal information and performs assembly to create a coherent knowledge base and assess its reliability. This assembled knowledge serves as the starting point for modeling. Delphi is a modeling framework that assembles quantified causal fragments and their contexts into executable probabilistic models that respect the semantics of the original text, and can be used to support decision making. %B NAACL %I Association for Computational Linguistics %P 42-47 %G eng %U https://www.aclweb.org/anthology/N19-4008 %0 Journal Article %J Bioinformatics %D 2019 %T INDRA-IPM: interactive pathway modeling using natural language with automated assembly %A Petar V. Todorov %A Benjamin M. Gyori %A John A. Bachman %A Peter K. Sorger %X

Summary

INDRA-IPM (Interactive Pathway Map) is a web-based pathway map modeling tool that combines natural language processing with automated model assembly and visualization. INDRA-IPM contextualizes models with expression data and exports them to standard formats.

Availability and implementation

INDRA-IPM is available at: http://pathwaymap.indra.bio. Source code is available at http://github.com/sorgerlab/indra_pathway_map. The underlying web service API is available at http://api.indra.bio:8000.

Supplementary information

Supplementary data are available at Bioinformatics online.

%B Bioinformatics %P btz289 %G eng %U https://doi.org/10.1093/bioinformatics/btz289 %0 Journal Article %J Database (in print) %D 2019 %T Re-curation and Rational Enrichment of Knowledge Graphs in Biological Expression Language %A Charles Hoyt %A Daniel Domingo-Fernandez %A Rana Aldisi %A Lingling Xu %A Kristian Kolpeja %A Sandra Spalek %A Esther Wollert %A John Bachman %A Benjamin Gyori %A Patrick Greene %A Martin Hofmann-Apitius %X The rapid accumulation of new biomedical literature not only causes curated knowledge graphs (KGs) to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich KGs. We have developed two workflows: one for re-curating a given KG to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the KGs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full-text articles using text mining output integrated by INDRA. We have made this workflow freely available at https://github.com/bel-enrichment/bel-enrichment. %B Database (in print) %V 2019 %P baz068 %G eng %U https://doi.org/10.1093/database/baz068 %0 Book Section %B Automated Reasoning for Systems Biology and Medicine %D 2019 %T Statistical Model Checking based Analysis of Biological Networks %A Liu, Bing %A Benjamin M. Gyori %A P. S. Thiagarajan %X We introduce a framework for analyzing ordinary differential equation  (ODE) models of biological networks using statistical model checking (SMC). A key aspect of our work is the modeling of single-cell variability by assigning a probability distribution to intervals of initial concentration values and kinetic rate constants. We propagate this distribution through the system dynamics to obtain a distribution over the set of trajectories of the ODEs. This in turn opens the door for performing statistical analysis of the ODE system’s behavior. To illustrate this, we first encode quantitative data and qualitative trends as bounded linear time temporal logic (BLTL) formulas. Based on this, we construct a parameter estimation method using an SMC-driven evaluation procedure applied to the stochastic version of the behavior of the ODE system. We then describe how this SMC framework can be generalized to hybrid automata by exploiting the given distribution over the initial states and the—much more sophisticated—system dynamics to associate a Markov chain with the hybrid automaton. We then establish a strong relationship between the behaviors of the hybrid automaton and its associated Markov chain. Consequently, we sample trajectories from the hybrid automaton in a way that mimics the sampling of the trajectories of the Markov chain. This enables us to verify approximately that the Markov chain meets a BLTL specification with high probability. We have applied these methods to ODE-based models of Toll-like receptor signaling and the crosstalk between autophagy and apoptosis, as well as to systems exhibiting hybrid dynamics including the circadian clock pathway and cardiac cell physiology. We present an overview of these applications and summarize the main empirical results. These case studies demonstrate that our methods can be applied in a variety of practical settings. %B Automated Reasoning for Systems Biology and Medicine %I Springer %P 63-92 %G eng %U https://doi.org/10.1007/978-3-030-17297-8_3 %0 Journal Article %J Cell Systems %D 2018 %T Encoding Growth Factor Identity in the Temporal Dynamics of FOXO3 under the Combinatorial Control of ERK and AKT Kinases %A Somponnat Sampattavanich %A Bernhard Steiert %A Bernhard A. Kramer %A Benjamin M. Gyori %A John G. Albeck %A Peter K. Sorger %B Cell Systems %V 6 %P 664-678 %G eng %N 6 %0 Journal Article %J BMC Bioinformatics %D 2018 %T FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining %A Bachman, John A %A Gyori, Benjamin M %A Sorger, Peter K %X

Background: For automated reading of scientific publications to
extract useful information about molecular mechanisms it is critical that
genes, proteins and other entities be correctly associated with uniform
identifiers, a process known as named entity linking or "grounding.'' Correct
grounding is essential for resolving relationships among mined information,
curated interaction databases, and biological datasets. The accuracy of this
process is largely dependent on the availability of machine-readable resources
associating synonyms and abbreviations commonly found in biomedical literature
with uniform identifiers.

Results: In a task involving automated reading of  ~215,000
articles using the REACH event extraction software we found that grounding was
disproportionately inaccurate for multi-protein families (e.g., "AKT") and
complexes with multiple subunits  (e.g."NF-kappaB'"). To address this
problem we constructed FamPlex, a manually curated resource defining protein
families and complexes as they are commonly encountered  in biomedical text. In
FamPlex the gene-level constituents of families and complexes are defined in a
flexible format allowing for multi-level, hierarchical membership. To create
FamPlex, text strings corresponding to entities were identified empirically
from literature and linked manually to uniform identifiers; these identifiers
were also mapped to equivalent entries in multiple related databases. FamPlex
also includes curated prefix and suffix patterns that improve named entity
recognition and event extraction.  Evaluation of REACH extractions on a test
corpus of ~54,000 articles showed that FamPlex significantly increased
grounding accuracy for families and complexes (from 15% to 71%). The
hierarchical organization of entities in FamPlex also made it possible to
integrate otherwise unconnected mechanistic information across families,
subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM
reading system and the Biocreative VI Bioentity Normalization Task dataset
demonstrated the utility of FamPlex in other settings.

Conclusion: FamPlex is an effective resource for improving named
entity recognition, grounding, and relationship resolution in automated reading
of biomedical text. The content in FamPlex is available in both tabular and
Open Biomedical Ontology formats at
https://github.com/sorgerlab/famplex under the Creative Commons CC0
license and has been integrated into the TRIPS/DRUM and REACH reading systems.

%B BMC Bioinformatics %I Cold Spring Harbor Laboratory %V 19 %G eng %N 248 %R 10.1101/225698 %0 Journal Article %J Molecular Systems Biology %D 2017 %T From word models to executable models of signaling networks using automated assembly %A Benjamin M. Gyori %A John A. Bachman %A Subramanian, Kartik %A Muhlich, Jeremy L. %A Lucian Galescu %A Peter K. Sorger %B Molecular Systems Biology %V 13 %P 954 %G eng %U http://msb.embopress.org/content/13/11/954 %N 11 %0 Journal Article %J Statistics and Computing %D 2015 %T Hypothesis testing for Markov chain Monte Carlo %A Gyori, Bejnamin M %A Daniel Paulin %B Statistics and Computing %I Springer %P 1–12 %G eng %N http://dx.doi.org/10.1007/s11222-015-959 %0 Journal Article %J Hybrid Systems Biology %D 2015 %T Parallelized Parameter Estimation of Biological Pathway Models %A R. Ramanathan %A Zhang, Yan %A Zhou, Jun %A Benjamin M. Gyori %A Weng-Fai Wong %A P. S. Thiagarajan %B Hybrid Systems Biology %V 9271 %P 37-57 %G eng %0 Conference Paper %B Proceedings of the IEEE/ACM International Conference on Computer-Aided Design %D 2015 %T Simulation-Guided Parameter Synthesis for Chance-Constrained Optimization of Control Systems %A Zhang, Yan %A Sriram Sankaranarayanan %A Gyori, Benjamin M %B Proceedings of the IEEE/ACM International Conference on Computer-Aided Design %I IEEE Press %P 208–215 %G eng %0 Journal Article %J Hybrid Systems Biology %D 2015 %T Approximate probabilistic verification of hybrid systems %A Gyori, Benjamin M %A Liu, Bing %A Soumya Paul %A Ramanathan, R %A Thiagarajan, PS %B Hybrid Systems Biology %V 9271 %P 96–116 %G eng %0 Journal Article %J arXiv preprint arXiv:1212.2016 %D 2015 %T Non-asymptotic confidence intervals for MCMC in practice %A Gyori, Benjamin M %A Daniel Paulin %B arXiv preprint arXiv:1212.2016 %G eng %0 Journal Article %J Redox Biology %D 2014 %T OpenComet: an automated tool for comet assay image analysis %A Benjamin M. Gyori %A Gireedhar Venkatachalam %A P. S. Thiagarajan %A David Hsu %A Marie-Veronique Clement %B Redox Biology %V 2 %P 457-465 %G eng %0 Journal Article %J arXiv preprint arXiv:1411.0976 %D 2014 %T Probabilistic verification of partially observable dynamical systems %A Gyori, Benjamin M %A Daniel Paulin %A Palaniappan, Sucheendra K %B arXiv preprint arXiv:1411.0976 %G eng %0 Thesis %B National University of Singapore %D 2014 %T Probabilistic Approaches to Modeling Uncertainty in Biological Pathway Dynamics %A Benjamin M. Gyori %B National University of Singapore %G eng %9 PhD %0 Conference Proceedings %B In Proceedings of the 11th International Conference on Computational Methods in Systems Biology, CMSB 13 %D 2013 %T Statistical model checking based calibration and analysis of bio-pathway models %A Sucheendra K. Palaniappan %A Benjamin M. Gyori %A Liu, Bing %A David Hsu %A P. S. Thiagarajan %B In Proceedings of the 11th International Conference on Computational Methods in Systems Biology, CMSB 13 %C Austria %G eng