Rebecca Sharp, Adarsh Pyarelal, Benjamin M. Gyori, and et al. 6/2019. “Eidos, INDRA & Delphi: From Free Text to Executable Causal Models.” In NAACL, Pp. 42-47. Association for Computational Linguistics. Publisher's VersionAbstract
Building causal models of complicated phenomena such as food insecurity is currently a slow and labor-intensive manual process. In this paper, we introduce an approach that builds executable probabilistic models from raw, free text. The proposed approach is implemented through three systems: Eidos, INDRA, and Delphi. Eidos is an open-domain machine reading system designed to extract causal relations from natural language. It is rule-based, allowing for rapid domain transfer, customizability, and interpretability. INDRA aggregates multiple sources of causal information and performs assembly to create a coherent knowledge base and assess its reliability. This assembled knowledge serves as the starting point for modeling. Delphi is a modeling framework that assembles quantified causal fragments and their contexts into executable probabilistic models that respect the semantics of the original text, and can be used to support decision making.
Petar V. Todorov, Benjamin M. Gyori, John A. Bachman, and Peter K. Sorger. 2019. “INDRA-IPM: interactive pathway modeling using natural language with automated assembly.” Bioinformatics, Pp. btz289. Publisher's VersionAbstract


INDRA-IPM (Interactive Pathway Map) is a web-based pathway map modeling tool that combines natural language processing with automated model assembly and visualization. INDRA-IPM contextualizes models with expression data and exports them to standard formats.

Availability and implementation

INDRA-IPM is available at: Source code is available at The underlying web service API is available at

Supplementary information

Supplementary data are available at Bioinformatics online.

Charles Hoyt, Daniel Domingo-Fernandez, Rana Aldisi, Lingling Xu, Kristian Kolpeja, Sandra Spalek, Esther Wollert, John Bachman, Benjamin Gyori, Patrick Greene, and Martin Hofmann-Apitius. 2019. “Re-curation and Rational Enrichment of Knowledge Graphs in Biological Expression Language.” Database (in print), 2019, Pp. baz068. Publisher's VersionAbstract
The rapid accumulation of new biomedical literature not only causes curated knowledge graphs (KGs) to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich KGs. We have developed two workflows: one for re-curating a given KG to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the KGs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full-text articles using text mining output integrated by INDRA. We have made this workflow freely available at
Bing Liu, Benjamin M. Gyori, and P. S. Thiagarajan. 2019. “Statistical Model Checking based Analysis of Biological Networks.” In Automated Reasoning for Systems Biology and Medicine, Pp. 63-92. Springer. Publisher's VersionAbstract
We introduce a framework for analyzing ordinary differential equation  (ODE) models of biological networks using statistical model checking (SMC). A key aspect of our work is the modeling of single-cell variability by assigning a probability distribution to intervals of initial concentration values and kinetic rate constants. We propagate this distribution through the system dynamics to obtain a distribution over the set of trajectories of the ODEs. This in turn opens the door for performing statistical analysis of the ODE system’s behavior. To illustrate this, we first encode quantitative data and qualitative trends as bounded linear time temporal logic (BLTL) formulas. Based on this, we construct a parameter estimation method using an SMC-driven evaluation procedure applied to the stochastic version of the behavior of the ODE system. We then describe how this SMC framework can be generalized to hybrid automata by exploiting the given distribution over the initial states and the—much more sophisticated—system dynamics to associate a Markov chain with the hybrid automaton. We then establish a strong relationship between the behaviors of the hybrid automaton and its associated Markov chain. Consequently, we sample trajectories from the hybrid automaton in a way that mimics the sampling of the trajectories of the Markov chain. This enables us to verify approximately that the Markov chain meets a BLTL specification with high probability. We have applied these methods to ODE-based models of Toll-like receptor signaling and the crosstalk between autophagy and apoptosis, as well as to systems exhibiting hybrid dynamics including the circadian clock pathway and cardiac cell physiology. We present an overview of these applications and summarize the main empirical results. These case studies demonstrate that our methods can be applied in a variety of practical settings.
Somponnat Sampattavanich, Bernhard Steiert, Bernhard A. Kramer, Benjamin M. Gyori, John G. Albeck, and Peter K. Sorger. 2018. “Encoding Growth Factor Identity in the Temporal Dynamics of FOXO3 under the Combinatorial Control of ERK and AKT Kinases.” Cell Systems, 6, 6, Pp. 664-678.
John A Bachman, Benjamin M Gyori, and Peter K Sorger. 2018. “FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.” BMC Bioinformatics, 19, 248.Abstract

Background: For automated reading of scientific publications to
extract useful information about molecular mechanisms it is critical that
genes, proteins and other entities be correctly associated with uniform
identifiers, a process known as named entity linking or "grounding.'' Correct
grounding is essential for resolving relationships among mined information,
curated interaction databases, and biological datasets. The accuracy of this
process is largely dependent on the availability of machine-readable resources
associating synonyms and abbreviations commonly found in biomedical literature
with uniform identifiers.

Results: In a task involving automated reading of  ~215,000
articles using the REACH event extraction software we found that grounding was
disproportionately inaccurate for multi-protein families (e.g., "AKT") and
complexes with multiple subunits  (e.g."NF-kappaB'"). To address this
problem we constructed FamPlex, a manually curated resource defining protein
families and complexes as they are commonly encountered  in biomedical text. In
FamPlex the gene-level constituents of families and complexes are defined in a
flexible format allowing for multi-level, hierarchical membership. To create
FamPlex, text strings corresponding to entities were identified empirically
from literature and linked manually to uniform identifiers; these identifiers
were also mapped to equivalent entries in multiple related databases. FamPlex
also includes curated prefix and suffix patterns that improve named entity
recognition and event extraction.  Evaluation of REACH extractions on a test
corpus of ~54,000 articles showed that FamPlex significantly increased
grounding accuracy for families and complexes (from 15% to 71%). The
hierarchical organization of entities in FamPlex also made it possible to
integrate otherwise unconnected mechanistic information across families,
subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM
reading system and the Biocreative VI Bioentity Normalization Task dataset
demonstrated the utility of FamPlex in other settings.

Conclusion: FamPlex is an effective resource for improving named
entity recognition, grounding, and relationship resolution in automated reading
of biomedical text. The content in FamPlex is available in both tabular and
Open Biomedical Ontology formats at under the Creative Commons CC0
license and has been integrated into the TRIPS/DRUM and REACH reading systems.

Benjamin M. Gyori, John A. Bachman, Kartik Subramanian, Jeremy L. Muhlich, Lucian Galescu, and Peter K. Sorger. 11/2017. “From word models to executable models of signaling networks using automated assembly.” Molecular Systems Biology, 13, 11, Pp. 954. Publisher's Version
Bejnamin M Gyori and Daniel Paulin. 2015. “Hypothesis testing for Markov chain Monte Carlo.” Statistics and Computing,, Pp. 1–12.
R. Ramanathan, Yan Zhang, Jun Zhou, Benjamin M. Gyori, Weng-Fai Wong, and P. S. Thiagarajan. 2015. “Parallelized Parameter Estimation of Biological Pathway Models.” Hybrid Systems Biology, 9271, Pp. 37-57.
Yan Zhang, Sriram Sankaranarayanan, and Benjamin M Gyori. 2015. “Simulation-Guided Parameter Synthesis for Chance-Constrained Optimization of Control Systems.” In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, Pp. 208–215. IEEE Press.
Benjamin M Gyori, Bing Liu, Soumya Paul, R Ramanathan, and PS Thiagarajan. 2015. “Approximate probabilistic verification of hybrid systems.” Hybrid Systems Biology, 9271, Pp. 96–116.
Benjamin M. Gyori, Gireedhar Venkatachalam, P. S. Thiagarajan, David Hsu, and Marie-Veronique Clement. 2014. “OpenComet: an automated tool for comet assay image analysis.” Redox Biology, 2, Pp. 457-465.
Benjamin M Gyori, Daniel Paulin, and Sucheendra K Palaniappan. 2014. “Probabilistic verification of partially observable dynamical systems.” arXiv preprint arXiv:1411.0976.
Benjamin M. Gyori. 2014. “Probabilistic Approaches to Modeling Uncertainty in Biological Pathway Dynamics.” National University of Singapore.
Sucheendra K. Palaniappan, Benjamin M. Gyori, Bing Liu, David Hsu, and P. S. Thiagarajan. 2013. “Statistical model checking based calibration and analysis of bio-pathway models.” In Proceedings of the 11th International Conference on Computational Methods in Systems Biology, CMSB 13. Austria.
Benjamin M Gyori and Daniel Paulin. 2012. “Non-asymptotic confidence intervals for MCMC in practice.” arXiv preprint arXiv:1212.2016.