Machine learning for drug safety

schematic drug target ADR

Adverse drug reactions are one of the leading causes of morbidity and mortality in health care. We used machine learning to identify which drug target genes are associated with adverse drug reactions.

target ADR associations

Drugs often hit many different targets and some drug side effects become clear only after drug approval. Machine learning can systematically identify connections between drug characterizations in the lab and the drug effects on humans using adverse event reporting from a large population. We analyzed an in vitro secondary pharmacology database provided by Urban and colleagues from Novartis Institutes of Biomedical Research of common (off) targets for 2134 marketed drugs.

Figure 1

To associate these drugs with human adverse drug reactions (ADRs), we queried the US FDA Adverse Event Reports database (FAERS).

Figure 2

Next, we developed random forest models that predict ADR occurrences from in vitro pharmacological profiles.

Random Forest Model

By evaluating Gini importance scores of model features, we identify 221 target-ADR associations.

Gini importance scores

We perform several systematic prediction validation analyses: our associations co-occur in PubMed abstracts to a greater extent than expected by chance. So, many predictions have supporting evidence from previous clinical, animal model or in vitro studies. We also predicted new adverse events for drugs that were observed in chronologically later event reports, benchmarked on OMOP. And for drugs that did not have reports, we correctly predicted the adverse events as listed on their drug labels.

Model performance

We find established causal relations, eg in vitro hERG binding with cardiac arrhythmia. Evidence on bile acid metabolism supports our identification of associations between BSEP and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Our model suggests PDE3 is associated with 40 ADRs. Understanding which drug targets are linked to ADRs can ultimately lead to the development of safer medicines. Altogether we provide a comprehensive resource to support drug development and future human biology studies.


Robert Ietswaart*,#, Seda Arat*,#, Amanda X. Chen*, Saman Farahmand*, Bumjun Kim, William DuMouchel, Duncan Armstrong, Alexander Fekete, Jeffrey J. Sutherland#, Laszlo Urban#
Machine learning guided association of adverse drug reactions with in vitro target-based pharmacology, EBioMedicine, 57, Pp. 102837

# corresponding author; * These authors contributed equally. Data is freely available as part of code repository on GitHub:


Ever had a list of gene hits and not known what to do? Maybe GO enrichment analysis left you wondering what your favorite gene is doing in your experiment? Check out GeneWalk, our machine Learning method that determines the relevant functions for each gene.

GeneWalk schematic

GeneWalk assembles an experiment-specific gene regulatory network from a knowledge base (e.g. Pathway Commons or INDRA) with genes and GO terms as nodes. neural network-based representation learning converts the gene network nodes into vectors describing the network topology for each node.

Comparison of gene and GO term vectors identifies which genes are highly connected and which GO terms are most relevant for each gene in the experiment. So GeneWalk reveals which genes are central to the specific biological context and it identifies what their roles are.


Further documentation

To get started with GeneWalk, check out our tutorial at the GeneWalk website.
For more details on how to run GeneWalk on your data, see our GitHub page.
For code documentation, see our readthedocs page.
Full description of the GeneWalk methodology and applications in our publication.



Robert Ietswaart, Benjamin M. Gyori, John A. Bachman, Peter K. Sorger, and L. Stirling Churchman
GeneWalk identifies relevant gene functions for a biological context using network representation learning,
Genome Biology 22, 55 (2021).