Adams, C.D. & Boutwell, B.B. Shared genomic architectures of Covid-19 and antisocial behavior. Translational Psychiatry 12, 1, 193 (2022).Abstract
Little is known about the genetics of norm violation and aggression in relation to coronavirus disease 2019 (COVID-19). To investigate this, we used summary statistics from genome-wide association studies and linkage disequilibrium score regression to calculate a matrix of genetic correlations (rgs) for antisocial behavior (ASB), COVID-19, and various health and behavioral traits. After false-discovery rate correction, ASB was genetically correlated with COVID-19 (rg = 0.51; P = 1.54E-02) and 19 other traits. ASB and COVID-19 were both positively genetically correlated with having a noisy workplace, doing heavy manual labor, chronic obstructive pulmonary disease, and genitourinary diseases. ASB and COVID-19 were both inversely genetically correlated with average income, education years, healthspan, verbal reasoning, lifespan, cheese intake, and being breastfed as a baby. But keep in mind that rgs are not necessarily causal. And, if causal, their prevailing directions of effect (which causes which) are indiscernible from rgs alone. Moreover, the SNPheritability (h2g ) estimates for two measures of COVID-19 were very small, restricting the overlap of genetic variance in absolute terms between ASB and COVID-19. Nonetheless, our findings suggest that those with antisocial tendencies possibly have a higher risk of exposure to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) than those without antisocial tendencies. This may have been especially true early in the pandemic before vaccines against SARS-CoV-2 were available and before the emergence of the highly transmissible Omicron variant.
Adams, C.D. Election Fraud?. Medium (2021). Publisher's VersionAbstract

Trump lost the 2020 presidential election, but he has refused to accept that he lost. Instead of conceding the race to Biden and initiating a smooth and peaceful transfer of power, Trump has baselessly claimed that there was systematic voting fraud against him. In doing this, he has shaken trust in democracy. In the days following the election, anonymous coders and a prominent ex-academic (Bret Weinstein) spread the idea that votes time stamped (that is, counted) after November 4th show proof of fraud, when more votes in batches went to Biden. The anonymous coders leaked a file they claim to have scraped from the NYT/”Edison” data on voting counts. In a tweet thread by “APhilosophae”, they provide scatter plots that show what they claim are “randomly” spread ratios of Biden to Trump votes that then become uniform after November 4th. They insinuate that the so-called pattern of “random” ratios becoming more even is evidence of voter fraud.

Since the stakes are high — our democracy is under siege — it is irresponsible and unethical to let claims of voting fraud go uncontested. As such, I have performed my own analysis, also using the NYT/”Edison” data, even though I can’t be sure about the file’s integrity. I found evidence that APhilosophae falsely interpreted the data. Biden had more than three million more votes than Trump on November 4th, when ~96% of the country’s votes had been tallied. This alone suggests that the thrust in Biden’s lead would continue as later-arriving mail-in votes got tallied. In addition to ignoring that Biden had the popular vote on November 4th, APhilosophae appears to have fundamentally misunderstood and misconstrued the nature of the data set. Each observation in the NYT/”Edison” file represents a time when the votes were cumulatively tallied. The percent shares of votes for each candidate were derived from the cumulative counts. Earlier times/batches showed greater differences in the percent shares of votes between the candidates, since earlier time points contained less cumulative information. In scatter plots of the ratios of Biden to Trump votes (y-axis) by time (x-axis), this phenomenon looks like a random spread of ratios on November 4th. But it isn’t. Earlier batches (on November 4th) display as bigger differences in the percent share of votes between the candidates. As vote batches from the end of the day were woven in, the differences between the percent shares of votes for the candidates were smaller. These entries counted later in the day display as ratios closer to 1. Given this, we wouldn’t expect the differences in the percent shares of votes between candidates to skyrocket on subsequent days, since the percent share of votes depends on the cumulative number of votes already tallied. The range for the ratios on subsequent days was constrained by the information for the candidates at the end of the night on November 4th. It is crucial to see this.

APhilosophae further claimed that Trump should get more votes than Biden over time, arguing that mail-in ballots would be expected to be from rural areas that are far away from polling centers and take longer to arrive. Implicit to this are that rural voters are more likely to vote for Trump and more likely to vote by mail. However, the latter is incorrect for two reasons: 1) the trend in the popular vote favored Biden early on, and 2) Democrats were more likely to vote last-minute by mail based on years of research covering trends in elections. Given this, the Biden-ward tallies in Pennsylvania and Georgia are ordinary and unremarkable. It would have been a surprise had some states Trump led in initially not gone to Biden.

I found no evidence of voter fraud, but I did see an error in how APhilosophae portrayed the nature of cumulative data and conclude that he/she spread a conspiracy theory.


Trump has refused to accept the outcome of the 2020 presidential election and has sown widespread distrust about voting counts, when no evidence has been presented to support this. Adding to the confusion and distrust, anonymous coders have claimed they can prove election fraud using data they allegedly scraped from the NYT/”Edison” data. A Twitter user named “APhilosophae” reported and interpreted the findings in a long Twitter thread that alleges the following:

  1. “irregularities” exist in the ratio of number of new votes per candidate, and
  2. these so-called “irregularities” mean later-counted mail-in ballots prove voter fraud.

Specifically, APhilosophae claimed states that went Biden-ward reflected fraud, alleging mail-in votes should be random and slightly for Trump; therefore, if mail-in ballots reflect a win for Biden, they are fraudulent. The tweet thread is viewable here.

Twitter user (handle: “cb_miller_”) posted a partial rebuttal pointing out that the columns for the percent share of votes per candidate are rounded and that rounding errors may have misled the anonymous coders.

So that you can see the data, the table below includes the most relevant columns. “vote-share_rep” and “vote_share_dem” are the percent of votes for each candidate for the given time stamp and state. They are based on the “votes” column, which is cumulative. The cumulative number of votes for each candidate (what I call “num_Trump” and “num_Biden”) can be obtained by multiplying “votes” by the variables for the percent shares for each candidate. Note that the point by cb_miller_ about rounding has to do with how the percent of votes for each candidate are rounded. It’s easy to see by adding together the percent shares for each candidate that they don’t total to 1. This may be partly explained by rounding but may also partly reflect the “votes” variable including votes for Jo Jorgensen and Howie Hawkins. Since no data dictionary came with the file describing the variables, your guess is as good as mine. Whatever the case with the rounding, APhilosophae made interpretative claims that go beyond possible rounding errors. I will address these.

Of note is that APhilosophae advised his/her audience to download the file and back it up. When I went to the weblink he/she provided for the file, the file was no longer available, which is suspicious. The version I acquired was provided in the thread by another member of the public.

Relevant Variables in the NYT/”Edison” Data Set

I became aware of APhilosophae’s Twitter thread with these claims of voter fraud, because Bret Weinstein (Twitter handle: “BretWeinstein”), who has over 420,000 followers, promoted APhilosophae’s tweet thread by saying he didn’t detect any flaws in the logic.

This promotion is striking on multiple accounts, not the least of which is that APhilosophae ended his/her tweet thread with a hashtag about Epstein not killing himself!

Thus, the analysis was promoted and spread by someone who either espouses conspiracy theories or jokes about them, while claiming the US presidential election was rigged in favor of Biden. That should’ve been a red flag for Bret Weinstein. As should’ve been the fact that the analysis was reported behind a pseudonym, supposedly scraped from the NYT’s proprietary data, and distributed widely with scatter plots mixed with conspiracy theory. In data science, the norm is transparency; it’s not kosher to popularize analyses anonymously with data that may or may not be fabricated, especially on matters of worldwide public interest, safety, and policy.

Weinstein is a former evolutionary biology professor. Forced out of academia in 2017 by mobs of students at Evergreen State College, he could be described as counter-counter-elite, which is a description that builds on Peter Turchin’s understanding that adademia now produces a counter-elite class (Wood 2020). Academics are the elites, though most of us rarely see ourselves that way. As with others who are more-or-less centrists in their thinking, Weinstein has opposed the counter-elites (the “regressive leftists”), making him counter-counter elite. However, because he is no longer a professor and is making a living on academia’s culture wars, he is not objective. His audience is largely right-wingers, who, like most of the general population, lack statistical training. In the name of purportedly understanding “existential risk”, Weinstein, who is not trained in data science, appears to have converted his incomprehension into support for unverified “anomalies” at a time when lies about voter fraud and irregularities are undermining belief in our democracy. Although Weinstein acknowledged the argument about rounding, as Uri Harris has pointed out, the end result is that Weinstein increased suspicion.

Weinstein’s original tweet saying he didn’t see any flaws in APhilosophae’s logic spread (as we shall see) an unfounded tale to many who would not have otherwise seen it, myself included. Further, Weinstein’s partial retraction is buried in the thread, which means many will not have seen the issue about rounding, leaving them to think they got valid insider knowledge about election fraud. Also, unfortunately, as helpful as the argument is about rounding, it doesn’t fully address the errors APhilosophae made. Thus, for the few who did see the buried tweet acknowledging rounding issues, it likely didn’t ring as a credible or complete refutation.

APhilosophae masqueraded conspiracy theory as data science, presumably because few would take the time to refute a statistical analysis. Although there is no way of knowing whether the data APhilosophae provided isn’t in some way faked, I treated the leaked NYT/”Edison” data as if it were real and did my own analysis. I was motivated to do this because letting a potential conspiracy theory about voting fraud go unchallenged at this time is irresponsible and unethical, especially as I have some training in data science that could help quell the confusion. I did this for the public to prevent further damage to our democracy.

Methods and Results

What I’d like to do now is walk you through what I’ve done and show you what I see. I did NOT attempt to replicate what APhilosophae did. For instance, APhilosophae appears to have transformed the cumulative counts per candidate into variables for the number of new counts added at each time. Since this was a derivation of variables, I stayed with the cumulative counts and did the analysis I would do if I had written a grant to examine the data.

First, I looked at the data in both Excel and R (R is statistical software). I removed duplicate entries and time points with zero votes. (Data cleaning is normal.) Second, I constructed models. In particular, I made multivariable linear regression models that looked at the number of Biden votes predicted by the number of Trump votes, when adjusting for time in days. I took the “time stamp” variable and collapsed it into a categorical variable for days 11–3, 11–4, 11–5, 11–6, and 11–7 (“Day” in the scatterplot below on the right). In the regression models, each time point is compared to 11–3 as the reference (at least in the models including all states.) The figure has two panels. The panel on the left gives us a non-parametric curve through the plots (in blue) and a linear predicted line in red. The panel on the right shows the same scatter plot but identifies the day on which the observations were time stamped.

Scatter Plots of Biden and Trump Votes (left panel with loess and linear prediction lines; right panel with time stamped observations by day)

Relevant here is that APhilosophae ran a median-based regression and does not appear to have dealt with time in his/her models, as least from what I can deduce from the scatter plots posted online, which are labeled with the “Theil-Senn Estimator”. They don’t appear to have considered the ratio of votes for the candidates when adjusting for time. But doing so is important, since some states, such as Florida, only have time stamps for 11–4. As such, it is important to equalize time and see how the share of votes relate to each other.

Below are the results for the model of cumulative Trump votes predicting cumulative Biden votes, when accounting for time in days. The coefficient “num_Trump” shows that there were 1.03 more Biden votes per Trump votes across the country.

Since some may ask or be thinking about this:

  • I also ran the simple linear regression without time and compared the simple and multivariable models. In doing this, I saw that the ratios changed only minusculely. This means that time is not a big driver of the relationship between the shares of votes for the candidates.
  • I also added an interaction term to the multivariable model to assess whether the ratio of candidate votes depends on time. It doesn’t, when looking across all states.
  • I considered using robust linear regression to give a penalized weight to each potential outlier. However, that defeats the purpose, since I want to look at observations that might be weird. (Robust linear regression would be similar to the median-based estimator used by APhilosophae, only it permits adjustment by covariates.)

Model of Biden Votes with Trump Votes as Independent Variable and Day as a Covariate

Cook’s distance

Whenever linear regression models are done, we look for outliers. This is especially needed in this case, since the number of cumulative Trump votes and the number of cumulative Biden votes are non-normally distributed and may not conform to the assumptions required for linear models. We can exploit this to see if something wonky is going in our data here.

One way to do this is to look at the residuals (the vertical distances between the observed values and the regression line). We could also surmise that there might be some outliers by looking back again at the scatterplot above: see the points in olive and teal that wander upwards away from the others? Whether or not these adversely affect the model or are true outliers is something we’ll look at now.

An extension of looking at the residuals is to calculate “Cook’s distance”. Cook’s distance is a combination of each observation’s leverage and residual values: the higher the leverage and residuals, the higher the Cook’s distance. An observation with a Cook’s distance greater than 1 is generally considered to be an influential outlier (Cook and Weisberg 1982). None of the observations in the “Edison” data fit this criterion. The maximum Cook’s distance is 0.015 (with a mean of 0.0001).

A Cook’s distance greater than 4/n (where n=number of observations in the data; here it’s 8213) can also be used, however, to investigate possible outliers. I did that. There were 341 observations to visually eyeball.

What states contain observations with potential outliers? APhilosophae pointed to Pennsylvania and Wisconsin as having likely mail-in voter fraud, with allegedly “weird” observations in the later days of the voting counts. However, see below: these states don’t even populate as potential wannabe outliers! Moreover, the lion’s share of these interesting observations occurs early on, likely representing either in-person votes or ballots mailed far in advance and counted on November 4th.

“Outlier” States by Day of Time Stamp

What can we learn about this? First, these observations aren’t true outliers, since their Cook’s distances were small. Second, since a time stamping on or after November 5th might be a reasonable proxy for mail-in ballots, there is no evidence of mail-in voter fraud: the bulk of whatever is special about these observations happened on November 4th. (While there were a few, there was no enrichment of “weird” observations on November 6th and 7th.)

Different metric: ratio of Biden/Trump votes

Further to this point, if we look, not at the cumulative votes, but at the ratio of the percent shares of Biden/Trump votes by day, the biggest gains in Biden’s favor occurred on November 4th. The red line in the scatter plot below is set at 1. The points below 1 represent those observations for which Trump had a greater percent share of the votes. Be careful when looking at the plot: note that the y-axis represents the magnitude of the ratios, not the number of votes.

We can’t see it this in the plot, but on November 4th, Trump was ahead in 27 states (Biden was ahead in 23), and there were 1463 more ratios in favor of Trump than Biden. (The Trump-favoring ratios are tightly packed between 0 and 1 and, thus, the density of them can’t be readily seen in the plot.)

On November 5th, the pattern reverses: there were 92 more ratios for Biden than Trump. And on November 6th and 7th, respectively, there were 161 and 66 more ratios in Biden’s favor. By the end of the election, two states, Pennsylvania and Georgia, had more overall votes for Biden. The candidates each had 25 states by popular vote end of November 7th (I will return to this in the Discussion).

Three important points:

  1. The seemingly restricted range for ratios after November 4th is due to smaller differences in the percent shares between candidates at the end of the day on November 4th.
  2. That there were more ratios favoring Trump on November 4th does not mean he had more overall votes that day (he didn’t).
  3. Although Trump had won more states on November 4th, he had lost the popular vote. This means that the country’s trend was for Biden and that, as more votes got counted, it was more likely that Biden would win.

Biden/Trump Ratios by Day


The figure below displays the data for Pennsylvania. November 4th had big differences in the percent shares of votes between the candidates and, on subsequent days, the differences were less extreme.

Biden/Trump Ratios by Day for Pennsylvania

APhilosophae claimed that it was an “irregularity” for the ratios to have gotten more moderate over time. He/she also claimed that the ratios early on were due to randomness. That’s an incorrect interpretation of the data. At some time points (the earlier ones), the difference in the percent shares of votes was big! This doesn’t show randomness. This reflects having less information. For instance, on November 4th, earlier time stamps were more likely to have greater percent differences in the vote shares between the candidates. As more votes were tallied, the differences were less extreme, since the percent shares is based on the cumulative votes. That makes sense, right? By the end of the day on November 4th, most of the information for the country was in. Given that the data are cumulative, we’d expect the magnitude of the differences in vote shares to be smaller and less messy than for the earliest time stamps.

Dichotomized Differences in Percent Vote Shares by Time Stamp (Early or Late on November 4th) in Pennsylvania

What if we stopped the counting on November 4th?

If we ignore all votes time stamped after the November 4th, who would win? Since APhilosophae used a median-based tool to project about the election, let’s look first at test of the medians for November 4th. The median cumulative number of votes for Trump was 740,463 (IQR = 1,274,340), whereas the median cumulative number of votes for Biden was 670,044 (IQR = 1,369,323). The Wilcoxon test, which uses the median instead of the mean, showed that the difference was significant (P < 0.0318, effect size r = 0.0182) in Trump’s favor. But this is wildly misleading. Biden got more total votes across the country at the end of November 4th! See the plot below. The statistics for the Wilcoxon (median-based) test are provided above it, for pedagogical purposes, which I will soon discuss. The graph itself shows that there were a lot of votes for Biden — into the sky. The medians for both candidates were similar, with Trump’s slightly larger, but the maximum number of votes were not! Trump’s median count was higher, but Biden’s maximum count was a lot higher. In the end, we care about the maximum in an election.

Spread of Trump and Biden votes for November 4th.

At the end of November 4th, ~96% of the votes across the country had been tallied (138,431,520/144,427,503), and Biden had 3,169,221 more votes than Trump. You can get a sense of that in the plot above.

Ignoring electoral college criteria, Biden was ahead by a lot on November 4th. Biden had the popular vote early on! We don’t need to run predictions to see this. We can just tally the votes. However, for the sake of consistency, see how similar the linear regression results are to the multivariable model including time (scroll above a few paragraphs). The ratio for the number of Biden votes predicted by the number of Trump votes is 1.03 for November 4th. This is the same as in the model when time in days is included.

Univariable Model of Biden Votes by Trump Votes on November 4th

A median-based tool, such as the Wilcoxon test or APhilosophae’s Theil-Senn estimator gives the false impression that Trump should have won on November 4th. In fact, Biden was beating Trump! You might be picking up on this, but a reason one might want to use the median might be if one wanted to ignore/mask/remove votes that favored the candidate getting more! Given Biden’s country-wide lead on November 4th, it isn’t surprising that a few states (Pennsylvania and Georgia) went to Biden as more votes were counted.

Back to the flagged observations and the comparisons of the cumulative votes

Now that we have established that Biden was actually ahead on November 4th, I’d like to return to the discussion about potential “outliers”. It occurred to me that California and Texas (the “outliers” for November 4th) were likely to go Blue and Red, respectively. To examine this, I used the variable that captured the difference between the share of Republic and Democratic votes for each observation. Even without performing a statistical test, it is easy to see that the flagged observations for California and Texas have big differences in the share of votes between candidates, favoring Biden and Trump, respectively. For all states, I dichotomized the differences in the percent shares at the mean and tabulated the results by whether an observation was one of these interesting (“outlier”) observations (chi-squared = 6905.6, df = 1, P < 2.2e-16).

Difference in Candidates’ Percent Votes by “Outlier” Status

Recall that there were no smoking-gun outliers and that the “outliers” category referred to in the table captures observations that had larger Cook’s distance values. None had truly large Cook’s distances, and there was no enrichment of “outliers” for later-arriving mail-in votes. The closest thing we get to outliers in our data are for the observations that captured big differences between the candidates. Most of this happened early on. The differences in the share of votes between the candidates was smaller over time. Formally testing the relationship of the differences in the percent shares between candidates across time shows it’s a significant relationship (tabulation below: chi-squared = 23.087, df = 4, P = 0.0001). I’ve provided the residuals below, which can aid the interpretation. When the value of the standardized residual is lower than -2, the cell contains fewer observations than expected. When the value is higher than 2, the cell contains more observations than expected. The residuals demonstrate the nature of cumulative data. As more counts were added, the differences between the candidates became less extreme, as can be seen in comparing the residuals for November 3rd and November 7th.

Difference in Candidates’ Percent Share of Votes by Day (residuals in parentheses)

Philosophae’s metric

As mentioned above, APhilosophae (or whoever did the original analysis) had looked at a slightly different metric than the cumulative votes per candidate. He/she appears to have transformed the cumulative number of counts per candidate, using subtraction, to derive the number of new votes per candidate at each time period by state. I also created these variables. In the process of doing this, I noticed some observations that eluded my initial data cleaning. I had removed two duplicates and observations with zero total votes, but I didn’t remove observations for which later time stamps showed fewer cumulative votes than the just-previous time stamp. If there are “irregularities” in the “Edison” data, these are the candidates! Notably, they were not detected with statistics but by looking closely at the time stamp and votes columns.

The observations that have a time stamp later in time but have smaller numbers of cumulative votes (the “votes” variable) are, indeed, strange. There were 52 such observations.

Take a look. Here are four entries for Florida. The third has the asynchronous time stamp.

Example of Asynchronous Time Stamp Entry

The fact that these exist in this data set, is not, however proof of fraud. In order for that to be likely, these asynchronous time stamps would need to differentially benefit one of the candidates and possibly change the fate of the election. As it happens, if we tally the number of new votes each candidate lost at these asynchronous time stamps, Biden lost 1,410,477 and Trump lost only 500,496! These, are, of course, non-sensical numbers; the votes aren’t, in reality, missing. But if they were real (they aren’t) the “irregularities” would’ve disproportionately injured Biden (chi-squared = 180.88, df = 1, P < 2.2e-16).

Asynchronous Time Stamp Status by Who “Lost” More Votes

What does this mean? Bupkis! Nothing. APhilosophae created variables for the new votes added for each candidate based on the order of the time stamps. This means that APhilosophae’s variables are non-sensical for these entries, if he/she didn’t catch this first.

The asynchronous time stamps appear to be artefacts of how the “Edison” data were worked with at the NYT or invented by the anonymous coders. Whatever the case, they undermine APhilosophae’s case that the election was rigged for Biden.


APhilosophae claimed that mail-in ballots should make the ratios of votes for the candidates random, since mail is (allegedly) randomized before arrival at polling centers. He/she highlighted plots for various states that show an apparent random addition of new votes early on for each candidate. APhilosophae further claimed that ballots mailed in by rural voters would be expected to be for Trump and that late-counted ballots would be expected to be from rural-living voters. He/she argued that any evidence in the data to the contrary is evidence of fraud. That is, if mail-in ballots time stamped after November 4th shifted the election in Biden’s favor, this must be evidence of fraud.

We do not need to analyze the data to counter this unfounded assertion. Democrats embrace voting by mail more earnestly than Republicans (Grahm 2020) and are known to vote later (Kilgore 2020). Democrats outpace Republicans in returning mail ballots by about two-fold (Scanlan 2020). For example, in 2018, Democrats won close U.S. House races in California, with late-counted mail-in ballots obliterating GOP leads. This phenomenon is known as “Blue Shift” and is the new norm (Scanlan 2020). About California’s 2018 election, then-Speaker Paul Ryan exasperated: “California just defies logic to me. We were only down 26 seats the night of the election and three weeks later, we lost basically every contested California race” (Grahm 2020).

APhilosophae’s assertions about “randomness” and subsequent “anomalous evenness” rely on a misunderstanding of cumulative data. Moreover, given Biden’s lead in the popular vote and that Republicans were less likely to vote by mail, most late-counted votes would be expected to veer Biden-ward based on trends in urban voters.

To conclude, I didn’t see any evidence that late-counted votes were among possible outliers in the leaked “Edison” data. I did see a pattern, readily apparent for California and Texas, where early-counted votes were enriched for those with big percent differences in vote counts for the candidates. I also saw some asynchronous time stamps in the “Edison” file that may have made APhilosophae’s variables for new counts added for each candidate spurious, if he/she wrote a script to create the variables without looking closely. I did not see any indication of fraud against Trump in the data set.

Biden won the election. It appears that APhilosophae, Bret Weinstein, and the anonymous coders have promoted conspiracy theories.


This text file I worked with is downloadable ( It’s a cleaned version of the NYT/”Edison” file a user left in APhilosophae’s tweet thread (with duplicates and empty fields removed). It has the columns I created for Cook’s distance. It still has the 52 time-stamped observations with fewer vote counts than the time before.


Cook, Dennis R, and Sanford Weisberg. 1982. Residuals and influence in regression. New York: Chapman; Hall.

Grahm, David A. 2020. “The ‘blue shift’ will decide the election: Something fundamental has changed about the ways Americans vote.” The Atlantic, August.

Kilgore, Ed. 2020. “Why do the last votes counted skew Democratic?” Intelligencer, August.

Scanlan, Quinn. 2020. “How battleground states process mail ballots — and why it may mean delayed results.” ABC News, October.

Wood, Graeme. 2020. “The next decade could be even worse: A historian believes he has discovered iron laws that predict the rise and fall of societies.” Atlantic


Adams, C.D. A Preliminary Report on Alcohol-Associated DNA Methylation Changes and Suicidal Behavior: Evidence Using Mendelian Randomization. Illness, Crisis and Loss (2021). Publisher's VersionAbstract
Suicide is a major public health concern. In 2015, it was the 10th leading cause of death in the US. The number of suicides increased by 30% in the US from 1999 to 2016, and a greater uptick in suicides is predicted to occur as a result of the COVID-19 crisis, for which the primary public-health strategy is physical distancing and during which alcohol sales have soared. Thus, current strategies for identifying at-risk individuals and preventing suicides, such as relying on self-reported suicidal ideation, are insufficient, especially under conditions of physical distancing, which exacerbate isolation, loneliness, economic stress, and possibly alcohol consumption. New strategies are urgent now and into the future. To that aim, here, a two-sample Mendelian randomization (an instrumental variables technique using public genome-wide association study data as data sources) was performed to determine whether alcohol-associated changes in DNA methylation mediate risk for suicidal behavior. The results suggest that higher alcohol-associated DNA methylation levels at cg18120259 confer a weak causal effect. Replication and triangulation of the results, both experimentally and with designs other than Mendelian randomization, are needed. If the findings replicate, the information might be utilized to raise awareness about the biological links between alcohol and suicide and possibly explored as a biomarker of risk, perhaps especially for early detection of those who may not self-report suicidal intent.
Sammallahti, S., et al. Maternal anxiety during pregnancy and newborn epigenome-wide DNA methylation. Molecular Psychiatry 26, 6, 1832-1845 (2021). Publisher's VersionAbstract
Maternal anxiety during pregnancy is associated with adverse foetal, neonatal, and child outcomes, but biological mechanisms remain unclear. Altered foetal DNA methylation (DNAm) has been proposed as a potential underlying mechanism. In the current study, we performed a meta-analysis to examine the associations between maternal anxiety, measured prospectively during pregnancy, and genome-wide DNAm from umbilical cord blood. Sixteen non-overlapping cohorts from 12 independent longitudinal studies of the Pregnancy And Childhood Epigenetics Consortium participated, resulting in a combined dataset of 7243 mother-child dyads. We examined prenatal anxiety in relation to genome-wide DNAm and differentially methylated regions. We observed no association between the general symptoms of anxiety during pregnancy or pregnancy-related anxiety, and DNAm at any of the CpG sites, after multiple-testing correction. Furthermore, we identify no differentially methylated regions associated with maternal anxiety. At the cohort-level, of the 21 associations observed in individual cohorts, none replicated consistently in the other cohorts. In conclusion, contrary to some previous studies proposing cord blood DNAm as a promising potential mechanism explaining the link between maternal anxiety during pregnancy and adverse outcomes in offspring, we found no consistent evidence for any robust associations between maternal anxiety and DNAm in cord blood. Larger studies and analysis of DNAm in other tissues may be needed to establish subtle or subgroup-specific associations between maternal anxiety and the foetal epigenome.
Adams, C.D. & Boutwell, B.B. Using multiple Mendelian randomization approaches and genetic correlations to understand obesity, urate, and gout. Scientific Reports 11, 1, 1-11 (2021). Publisher's VersionAbstract
Observational studies suggest relationships between obesity, urate, and gout but are possibly confounded. We assessed whether genetically determined obesity, higher urate (and related traits), and gout were causal using multiple Mendelian randomization (MR) approaches and linkage disequilibrium score regression for genetic correlations (rg). For data, we used genome-wide association study summary statistics available through MR-Base. We observed that obesity increased urate (beta = 0.127; 95% CI = 0.098, 0.157; P-value = 1.2E−17; rg = 0.25 [P-value = 0.001]) and triglycerides (beta = 0.082; 95% CI = 0.065, 0.099; P-value = 1.2E−21; rg = 0.23 [P-value = 8.8E−12]) and decreased high-density lipoprotein cholesterol (HDL) (beta = − 0.083; 95% CI = − 0.101, − 0.065; P-value = 2.5E−19; rg = − 0.28; [P-value = 5.2E−24]). Higher triglycerides increased urate (beta = 0.198; 95% CI = 0.146, 0.251; P-value = 8.9E−14; rg = 0.29 [P-value = 0.001]) and higher HDL decreased urate (beta = − 0.109; 95% CI = − 0.148, − 0.071; P-value = 2.7E− 08; rg = − 0.21 [P-value = 9.8E−05]). Higher urate (OR = 1.030; 95% CI = 1.028, 1.032; P-value = 1.1E−130; rg = 0.89 [P-value = 1.7E−55]) and obesity caused gout (OR = 1.003; 95% CI = 1.001, 1.004; P-value = 1.3E−04; rg = 0.23 [P-value = 2.7E−05]). Obesity on gout with urate as a mediator revealed all the effect of obesity on gout occurred through urate. Obesity on low-density lipoprotein cholesterol (LDL) was null (beta = −0.011; 95% CI = −0.030, 0.008; P-value = 2.6E−01; rg = 0.03 [P-value = 0.369]). A multivariable MR of obesity, HDL, and triglycerides on urate showed obesity influenced urate when accounting for HDL and triglycerides. Obesity’s impact on urate was exacerbated by it decreasing HDL.
Adams, C.D. Geneticists, let’s talk about forensic genetics at the US border. Genes to Genomes (2020). Publisher's VersionAbstract

The Trump administration has proposed legislation that would make it legal to forcibly collect DNA from hundreds of thousands of migrants held in detention centers at the US-Mexico border [1,2]. This type of mass genetic surveillance is unprecedented. The closest comparison we have for it is the routine screening of newborns for genetic disorders. Most don’t know that DNA samples are obtained from nearly everyone born in the US. Blood is collected at birth, sent off to state laboratories, and then subsequently screened for a host of genetic disorders, many of which neither parents nor most physicians know exist (e.g., very long-chain acyl-CoA dehydrogenase deficiency).

However, unlike with newborn screening, whereby sick babies may directly benefit from the knowledge gained from their DNA, the vast majority of migrants will receive no benefit. Foreseeably, there will be considerable harm. Migrants’ genetic information will be fed into the FBI’s Combined DNA Index System (CODIS) database, which is used forensically in criminal investigations to link serial violent crimes with known offenders. This new influx of DNA samples will saturate CODIS with DNA data from Latino populations and is a stark break from the US’s current policy of collecting DNA from those who are arrested for a crime. 

Given that the proposed policy casts migrants as would-be criminals, the genetics community has a duty to discuss how the US government plans to use our field’s staple technology. By “genetics community” I mean researchers and practitioners with expertise in various aspects of genetics, but also those with expertise in human behavior, epidemiology, bioethics, law, and more. The burden falls on us to offer our knowledge and insights into the ramifications of the Justice Department’s proposed policy, before we enact legislation that stomps out avenues for addressing serious concerns. Because I’m calling for discussion, I’ll start. I am a public-health geneticist (my area of genetics is interdisciplinary and includes ethics) and an epidemiologist and can speak here to some harms involved in collecting genetic data on hundreds of thousands of people. But I am but one person.

Forensic geneticists can better speak to the accuracy of the genetics behind the screening that is planned at the border—that is, they can tell us whether or not the genetic markers chosen for identifying suspected criminals are appropriate and will yield accurate results. But that kind of accuracy is not the only consideration. Human error and interpretation are part of all mass screening programs and contribute to false positives (calling a positive test result “real” when it isn’t) and false negatives (calling a test result “normal” when it isn’t), even when a scientific technique is flawless.

Come back with me to 1995 Los Angeles for an example of the high stakes and pitfalls involved in using genetics forensically: remember the OJ Simpson trial (People of the State of California v. Orenthal James Simpson) [3,4]. Two genetics experts were consulted, one for the defense [5] and one for the prosecution [6]. Both used their expertise to evaluate what genetics revealed about OJ’s role in the murder of his wife. They came, however, to different conclusions about what could be gleaned from the DNA evidence. 

Fast forward to 2007—to Perugia, Italy. Amanda Knox, a 20-year-old American college student living abroad in Italy, was accused of stabbing her UK housemate to death. A speck of DNA evidence on a knife handle from a knife she used for cooking was used to convict her. But that wasn’t the end of the story. Her notorious case turned into a revolving door of conviction and exoneration. She was convicted of the same murder twice before finally being exonerated years later in 2015. The same DNA evidence that was admitted as evidence initially was later deemed invalid by experts [7].

Most Americans find DNA evidence strongly persuasive. A Gallup poll in 2015 showed that 85% of Americans consider DNA evidence to be very or completely convincing [8]. This means that once someone grasps a story linking a person to a crime with DNA, it is hard to consider that the accused may still be innocent. 

Take the case of Lukis Anderson, a homeless man of African-American ancestry who admits he likes to drink alcohol, a lot, and sometimes passes out. Anderson was charged with the murder of Raveesh Kumra, because his DNA was on Kumra’s fingernails. Anderson told his public defender he didn’t remember doing it, but suggested it was possible that he might have but couldn’t remember [9]. That is how persuasive DNA evidence is. 

But the crime was a heinously violent act coordinated by a group of men who broke into Kumra’s home. They blindfolded him, stuffed his mouth with moustache-print duct tape, hit his companion in the mouth, and tied her up next to him. Then they plundered the house and left Kumra to suffocate. Despite Anderson not knowing Kumra and not remembering the event, his DNA was under Kumra’s fingernails, leading investigators to hypothesize that Kumra struggled as Anderson tied him up [9].

Interestingly, other people’s DNA can be found under the fingernails of 1 in 5 people [10]. We can have DNA on us from those we have never met, due to a phenomenon known as “secondary DNA transfer” (e.g., touching a cup someone else touched previously). Secondary DNA transfer may be what happened in Anderson’s case and is an example of a false positive (Anderson’s DNA on Kumra’s nails made Anderson appear guilty, when he was innocent). He could have faced the death penalty had he not had a solid alibi (medical records showing him inebriated and nearly unconscious at a hospital the night of the murder) and a persistent public defender [9].   

Let’s return to newborn screening, another setting where DNA samples require interpretation. Each state in the US has legislation that permits blood to be taken from the heels of newborns born in hospitals. For obvious reasons, infants can’t consent to this procedure. Their blood is used to screen for the presence of certain genetic conditions, some of the more familiar being cystic fibrosis and sickle cell disease. This is a form of large-scale genetic surveillance, which the federal and state governments sanction. Why? Collecting DNA may directly benefit infants suffering from the screened-for genetic conditions, if these conditions are identified and treated before the onset of symptoms. Well-oiled health systems that include feedback from parents and other public stakeholders (e.g., medical experts, lawmakers, and researchers) make this possible. Without newborn genetic screening, many affected infants would suffer irreversible damage and possibly early death. Hence, the US government and each of the state governments have decided to enact these policies because the benefit to affected infants seemingly justifies the lack of consent and the harms of screening nearly everyone born in the US. 

About the harms: By screening the entire population of hospital-born babies, we generate false positives; in this case, laboratory results that incorrectly indicate that a baby has a disease.

Imagine being a parent who gets a phone call saying your baby’s newborn screen is abnormal. You are told she could die. You rush her to the hospital for more tests and you wait, distressed and in an unbearable panic. You can’t eat or think straight. You obsess about high school biology, remembering that genetics are passed from generation to generation. You wonder if you gave your baby the gene. 

Months later, you learn that your daughter’s original genetic sample had been mislabeled. A government worker tells you the hospital where your daughter was born was understaffed. A lot of babies were born that day. One of the overworked nurses had mislabeled the blood sample. This kind of error happens, as do other mishaps: samples can be left on car dashboards and degrade, assays can fail and give misleading information, and sometimes those who interpret results mess up. Other times, having a certain genetic profile never results in the disease it matches with, even when the initial or predictive test results are accurate. 

False positives are inevitable harms of collecting DNA on a large swath of the population. When the testing is done on newborns, healthy children and their families shoulder the burden of this nonconsensual use of DNA. As a society, we have decided that harm to healthy infants and their families is less important than the benefit to those afflicted with a devastating genetic disease. This seems an understandable tradeoff.  

For detained migrants, however, taking their DNA cannot be justified by any such benevolence. Migrants would suffer all the harms inherent to screening programs without any of the benefits. Some of these harms include wrongful conviction, permanently separating families that are biologically related, and compromising migrants’ privacy. Cybersecurity cannot be guaranteed. Enriching CODIS with DNA from migrants places them at disproportionate risk for cybersecurity breaches, possibly leading to a denial of life insurance or employment. Moreover, after harsh treatment at the border, migrants may understandably later recoil from participating in genetic research that could, in fact, directly benefit them, such as efforts to understand breast cancer in Latinas of different ancestries. Stealing DNA at the border could increase migrants’ distrust for science and research at a time when precision medicine has the promise to mitigate disease. It’s unconscionable to let this happen. 

Eventually millions of people will be affected. Last year alone, approximately 743,000 people fell into the category implicated by the proposed legislation. The new rule would remove 28 CFR 28.12(b)(4), the regulation that had previously exempted detained migrants from having their DNA routinely taken [2]. We cannot afford to let the proposed legislation slip into reality without public discussion about the policy implications. Destructive episodes in the history of genetics in the 20th century, such as experiments on prisoners and forced sterilizations of vilified populations [11], make it crucial to talk about the social implications of technologies that target people of a particular ethnic heritage. The proposed legislation treats all detained migrants as if they are criminals and, importantly, as if they will be criminals. Here is a quote from the Federal Registrar about this: “[P]rompt DNA-sample collection could be essential to the detection and solution of crimes [migrants] may have committed or may commit in the United States” [2]. The revealing words here are may commit. This is reminiscent of the dystopian film Minority Report, in which citizens are arrested prior to committing a crime, when a “precog” (someone believed to have trustworthy premonitions) gets a flash of information suggesting a crime may be committed soon [12]. Thus, the proposed legislation is not only about catching individuals that may have already committed crimes. It casts undocumented migrants as likely to commit crimes in the future. If the genetics community doesn’t speak out about this now, it’s passive endorsement. Imagine what future historians of science will say? Will they not link the use of genetics to “build the wall” [13] with the other misuses of genetics in the 20th century? As the saying from Hillel goes, “If I am not for myself, who will be for me? But if I am only for myself, who am I? If not now, when?” [14]. The clock is ticking. 

Aristotle is rumored to have declared, “Poverty is the parent of revolution and crime”. The longer a child remains in poorer circumstances, the higher the risk for violent criminality as a young adult, according to a recent and large population-based study published in Lancet Public Health [15]. The FBI could be justified in claiming that crime happens more often in poorer neighborhoods, since the connection between violent crime and economic disadvantage is well-documented [16]. But imagine the outcry if the newborn screening DNA obtained from newborns living in relatively poor zip codes were given to the FBI for forensic purposes. These innocent newborns, by virtue of their zip codes, are predicted to have a greater chance of possibly committing violent crimes later in their lives than those born into families living in wealthier zip codes. Just as the argument from the US Justice Department justifies collecting DNA from migrants because they might commit crimes in the future, this same argument could someday be levied against the poor who happen to live on this side of the US fence. The road we are headed down by staying silent is dangerous. A nationwide shift in how migrants are viewed is necessary. Migrants must be treated with dignity and receive the protections expected by any person who is not believed to have committed a violent crime. 

Along with the discussion of the dangers inherent in casting groups as would-be criminals, it would be helpful to discuss ethical uses of forensic genetics and how these might be leveraged to do good. Examples of ethical uses of DNA evidence might include those similar to Mary-Claire King’s successes in reuniting separated Argentinian families [17] and the use of genetics to identify victims of human rights violations [18]. It is conceivable that, in a similar way, some migrants could be reunited with children torn from them at the border. But the stated use of the pilot DNA collection program upon which the proposed legislation is partly based is to detect “fraudulent family units”— migrants believed to be posing as families to exploit the system [1]. Thus, I suspect the US government will not use DNA taken from migrants in a way that provides the direct benefit of reuniting families. But it is possible to look into how to do it. I am not, by myself, knowledgeable enough to help in this area, other than to signal its importance.  

From my vantage point as a public-health geneticist, I have offered a few examples of the harms involved in mass genetic screening, including errors and racial profiling. If DNA is going to be taken forcibly, striving to ensure there is some benefit to our migrants is the least we can do given the power we hold and the ethical responsibility we have to other human beings. Because of our collective knowledge of the science, history, and ethical dimensions of genetics, we — maybe even more than government officials— are responsible for the ways in which our technology is used. Moreover, migrants need to be part of the discussion to find out what would be helpful to them and how they hope their DNA could be used, should the legislation be enacted. Those with skills in starting and maintaining community dialogue are needed here. 

In addition to our technical expertise and knowledge of history, the genetics community includes people of varying political persuasions, representing a range of opinions regarding how to handle migration. We should talk. But whatever your politics, using genetics to racially police society is both unnecessary and unethical. 

I hope we can pool our resources to talk about the legislation and act to protect the humanity and futures of those fighting to be in the United States.

Adams, C.D. Circulating sphingomyelins on estrogen receptor-positive and estrogen receptor-negative breast cancer-specific survival. Breast Cancer Management (2020). Publisher's VersionAbstract
Aim: This study aims to determine whether a causal relationship exists between circulating sphingomyelins and breast cancer-specific survival, since, if one does, sphingomyelins could be studied as a therapeutic target in the management of breast cancer. Patients/materials & methods: Mendelian randomization is used here to investigate whether higher levels of circulating sphingomyelins impact breast cancer-specific survival for estrogen receptor-negative (ER–) and estrogen receptor-positive (ER+) patients. Results: The results suggest a null effect of sphingomyelins for ER– breast cancer-specific survival and a protective effect for ER+ breast cancer-specific survival. Sensitivity analyses implicate low-density lipoprotein cholesterol as a potential confounder. Conclusion: Future studies should replicate and triangulate the present findings with other methods and tease out the roles of sphingomyelins and low-density lipoprotein cholesterol on breast cancer-specific survival.
Adams, C.D. Circulating Glutamine and Alzheimer’s Disease: A Mendelian Randomization Study. Clinical Interventions in Aging 15, 185–193 (2020). Publisher's VersionAbstract


Alzheimer’s disease is a devastating neurodegenerative disorder. Its worldwide prevalence is over 24 million and is expected to double by 2040. Finding ways to prevent its cognitive decline is urgent.


A two-sample Mendelian randomization study was performed instrumenting glutamine, which is abundant in blood, capable of crossing the blood-brain barrier, and involved in a metabolic cycle with glutamate in the brain.


The results reveal a protective effect of circulating glutamine against Alzheimer’s disease (inverse-variance weighted method, odds ratio per 1-standard deviation increase in circulating glutamine = 0.83; 95% CI 0.71, 0.97; P = 0.02).


These findings lend credence to the emerging story supporting the modifiability of glutamine/glutamate metabolism for the prevention of cognitive decline. More circulating glutamine might mean that more substrate is available during times of stress, acting as a neuroprotectant. Modifications to exogenous glutamine may be worth exploring in future efforts to prevent and/or treat Alzheimer’s disease.

Adams, C.D. A multivariable Mendelian randomization to appraise the pleiotropy between intelligence, education, and bipolar disorder in relation to schizophrenia. Scientific Reports 10, 1, 6018 (2020). Publisher's Version pleiotropy_between_intelligence_education_bipolar_disorder_in_relation_to_schizophrenia.pdf
Adams, C.D. A Mendelian randomization study of circulating glutamine and red blood cell traits. Pediatric Blood & Cancer 67, 9, e28333 (2020). Publisher's Version
Adams, C.D. & Boutwell, B.B. A Mendelian randomization study of telomere length and blood-cell traits. Scientific Reports 10, 1, 12223 (2020). Publisher's VersionAbstract
Whether telomere attrition reducing proliferative reserve in blood-cell progenitors is causal has important public-health implications. Mendelian randomization (MR) is an analytic technique using germline genetic variants as instrumental variables. If certain assumptions are met, estimates from MR should be free from most environmental sources of confounding and reverse causation. Here, two-sample MR is performed to test whether longer telomeres cause changes to hematological traits. Summary statistics for genetic variants strongly associated with telomere length were extracted from a genome-wide association (GWA) study for telomere length in individuals of European ancestry (n = 9190) and from GWA studies of blood-cell traits, also in those of European ancestry (n ~ 173,000 participants). A standard deviation increase in genetically influenced telomere length increased red blood cell and white blood cell counts, decreased mean corpuscular hemoglobinand mean cell volume, and had no observable impact on mean corpuscular hemoglobin concentration, red cell distribution width, hematocrit, or hemoglobin. Sensitivity tests for pleiotropic distortion were mostly inconsistent with glaring violations to the MR assumptions. Similar to germline mutations in telomere biology genes leading to bone-marrow failure, these data provide evidence that genetically influenced common variation in telomere length impacts hematologic traits in the population.
Adams, C.D. & Boutwell, B.B. Can increasing years of schooling reduce type 2 diabetes (T2D)?: Evidence from a Mendelian randomization of T2D and 10 of its risk factors. Scientific Reports 10, 1, 12908 (2020). Publisher's VersionAbstract
A focus in recent decades has involved examining the potential causal impact of educational attainment (schooling years) on a variety of disease and life-expectancy outcomes. Numerous studies have broadly revealed a link suggesting that as years of formal schooling increase so too does health and wellbeing; however, it is unclear whether the associations are causal. Here we use Mendelian randomization, an instrumental variables technique, with a two-sample design, to probe whether more years of schooling are causally linked to type 2 diabetes (T2D) and 10 of its attendant risk factors. The results revealed a protective effect of more schooling years against T2D (odds ratio = 0.39; 95% confidence interval: 0.26, 0.58; P = 3.89 × 10–06), which in turn might be partly mediated by more years of schooling being protective against the following: having a father with T2D, being overweight, having higher blood pressure and higher levels of circulating triglycerides, and having lower levels of HDL cholesterol. More schooling years had no effect on risk for gestational diabetes or polycystic ovarian syndrome and was associated with a decreased likelihood of moderate physical activity. These findings imply that strategies to retain adults in higher education may help reduce the risk for a major source of metabolic morbidity and mortality.
Adams, C.D. & Boutwell, B.B. A research note on Mendelian randomization and causal inference in criminology: promises and considerations. Journal of Experimental Criminology volume 18, 171–182 (2020). Publisher's VersionAbstract


Here, we provide a brief overview of a technique that may hold promise for scholars working on key criminological and criminal justice topics.


We provide an abbreviated overview of Mendelian randomization (MR), a newer variant of instrumental variables analysis, facilitated by expanding genomic technology worldwide. Our goal is to offer readers, unacquainted with the topic, a quick checklist of key assumptions, considerations, shortcomings, and practical applications of the technique.


The causal inference capabilities of the design seem poised to continue pushing modern crime science forward, assuming that careful attention is payed to key assumptions of the technique.


Researchers interested in causality as it relates to antisocial behaviors may benefit by the addition of MR to the toolkit alongside other data analysis tools. This strategy also offers an avenue for cross-collaboration with scientists working in other fields, thus expanding the breadth of expertise contributing to an important societal subject in crime.

Adams, C.D. & Neuhausen, S.L. Evaluating causal associations between chronotype and fatty acids and between fatty acids and type 2 diabetes: A Mendelian randomization study. Nutrition, Metabolism & Cardiovascular Diseases 29, 11, 1176-1184 (2019). Publisher's VersionAbstract

Background and aims: Preference for activity in the morning or evening (chronotype) may impact type 2 diabetes (T2D) risk factors. Our objective was to use Mendelian randomization (MR) to evaluate whether there are causal links between chronotype and one potential T2D risk factor, total fatty acids (TOTFA), and between TOTFA and T2D.

Methods and results: We estimated the causal effect of: 1) morning chronotype on TOTFA; and 2) higher TOTFA on T2D. We found that: a) morning compared to evening chronotype was associated with lower TOTFA levels (inverse-weighted variance (IVW) estimate -0.21; 95% CI -0.38, -0.03; raw P = 0.02; FDR-corrected P 0.04) and b) elevated TOTFA levels were protective against T2D (IVW estimate -0.23; 95% CI -0.41, -0.05; raw P = 0.01; FDR-corrected P = 0.03). Based on this finding, we further hypothesized that healthy fats would show a similar pattern and performed MR of a) morning chronotype on omega-3 (Omega-3), monounsaturated (MUFA), and polyunsaturated (PUFA) fatty acids; and b) MR of each of these fat types on T2D. We observed the same mediating-type pattern for chronotype, MUFA, and T2D as we had for chronotype, TOTFA, and T2D, and morning chronotype was associated with lower Omega-3.

Conclusion: Our findings provide suggestive, new information about relationships among chronotype, TOTFA, and T2D and about chronotype as a factor influencing Omega-3, MUFA, and TOTFA levels. In addition, we validated previous knowledge about MUFA and T2D. Morning chronotypes may predispose towards lower levels of TOTFA and some healthy fats, whereas higher levels of TOTFA and MUFA may protect against T2D.

Adams, C.D. Null effect of circulating sphingomyelins on risk for breast cancer: A Mendelian randomization report using Breast Cancer Association Consortium (BCAC) data. F1000Research 8, 2119 (2019). Publisher's VersionAbstract
Background: Changes in cellular metabolism are a hallmark of cancer and are linked with sphingolipid synthesis. Due to immense interest in how sphingolipids influence chemoresistance, more is known about the impact of sphingolipids during cancer treatment and progression than about the potential role of sphingolipids in the induction of tumors in humans.
Methods: Because estrogen triggers sphingolipid signaling cascades, the causal role of circulating levels of sphingomyelin (a type of sphingolipid) on breast cancer was investigated with a well-powered Mendelian randomization design.
Results: The results reveal a null effect (OR = 0.94; 95% CI = 0.85, 1.05; P = 0.30).
Conclusion: Despite the role sphingomyelins play during chemoresistance and cancer progression, circulating sphingomyelins do not appear to initiate or protect from breast cancer. This finding comprises the first causal report in humans that sphingomyelins on breast cancer initiation is null. Future investigations of risk in other cancer types are needed to further explore the potential role of sphingolipid biology in cancer etiology.
Adams, C.D., et al. Circulating Metabolic Biomarkers of Screen-Detected Prostate Cancer in the ProtecT Study. Cancer Epidemiology Biomarkers and Prevention 28, 1, 208-216 (2019). Publisher's VersionAbstract

Background: Whether associations between circulating metabolites and prostate cancer are causal is unknown. We report on the largest study of metabolites and prostate cancer (2,291 cases and 2,661 controls) and appraise causality for a subset of the prostate cancer-metabolite associations using two-sample Mendelian randomization (MR).

Methods: The case-control portion of the study was conducted in nine UK centers with men ages 50-69 years who underwent prostate-specific antigen screening for prostate cancer within the Prostate Testing for Cancer and Treatment (ProtecT) trial. Two data sources were used to appraise causality: a genome-wide association study (GWAS) of metabolites in 24,925 participants and a GWAS of prostate cancer in 44,825 cases and 27,904 controls within the Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium.

Results: Thirty-five metabolites were strongly associated with prostate cancer (P < 0.0014, multiple-testing threshold). These fell into four classes: (i) lipids and lipoprotein subclass characteristics (total cholesterol and ratios, cholesterol esters and ratios, free cholesterol and ratios, phospholipids and ratios, and triglyceride ratios); (ii) fatty acids and ratios; (iii) amino acids; (iv) and fluid balance. Fourteen top metabolites were proxied by genetic variables, but MR indicated these were not causal.

Conclusions: We identified 35 circulating metabolites associated with prostate cancer presence, but found no evidence of causality for those 14 testable with MR. Thus, the 14 MR-tested metabolites are unlikely to be mechanistically important in prostate cancer risk.

Impact: The metabolome provides a promising set of biomarkers that may aid prostate cancer classification.

Beynon, R.A., et al. Investigating the effects of lycopene and green tea on the metabolome of men at risk of prostate cancer: The ProDiet randomised controlled trial. International Journal of Cancer 144, 8, 1918–1928 (2019). Publisher's VersionAbstract
Lycopene and green tea consumption have been observationally associated with reduced prostate cancer risk, but the underlying mechanisms have not been fully elucidated. We investigated the effect of factorial randomisation to a 6‐month lycopene and green tea dietary advice or supplementation intervention on 159 serum metabolite measures in 128 men with raised PSA levels (but prostate cancer‐free), analysed by intention‐to‐treat. The causal effects of metabolites modified by the intervention on prostate cancer risk were then assessed by Mendelian randomisation, using summary statistics from 44,825 prostate cancer cases and 27,904 controls. The systemic effects of lycopene and green tea supplementation on serum metabolic profile were comparable to the effects of the respective dietary advice interventions (R 2 = 0.65 and 0.76 for lycopene and green tea respectively). Metabolites which were altered in response to lycopene supplementation were acetate [β (standard deviation difference vs. placebo): 0.69; 95% CI = 0.24, 1.15; p = 0.003], valine (β: −0.62; −1.03, −0.02; p = 0.004), pyruvate (β: −0.56; −0.95, −0.16; p = 0.006) and docosahexaenoic acid (β: −0.50; −085, −0.14; p = 0.006). Valine and diacylglycerol were lower in the lycopene dietary advice group (β: −0.65; −1.04, −0.26; p = 0.001 and β: −0.59; −1.01, −0.18; p = 0.006). A genetically instrumented SD increase in pyruvate increased the odds of prostate cancer by 1.29 (1.03, 1.62; p = 0.027). An intervention to increase lycopene intake altered the serum metabolome of men at risk of prostate cancer. Lycopene lowered levels of pyruvate, which our Mendelian randomisation analysis suggests may be causally related to reduced prostate cancer risk.
Adams, C.D. & Neuhausen, S.L. Bi-directional Mendelian randomization of epithelial ovarian cancer and schizophrenia and uni-directional Mendelian randomization of schizophrenia on circulating glycerophosphocholine metabolites. Moledular Genetics and Metabolism Reports 6, 21, 100539 (2019). Publisher's VersionAbstract
Most women with epithelial ovarian cancer (EOC) present with late-stage disease. As a result, globally, EOC is responsible for >150,000 deaths a year. Thus, a better understanding of risk factors for developing EOC is crucial for earlier screening and detection to improve survival. To that effort, there have been suggestions that there is an association of schizophrenia and cancer, possibly because metabolic changes are a hallmark of both cancer and schizophrenia (SZ). Perturbed choline metabolism has been documented in both diseases. Our objective was to use Mendelian randomization to evaluate whether SZ increased risk for developing EOC or the converse, and, whether SZ impacted 1- or 2-glycerophosphocholine (1- or 2-GPC) metabolites. We found that SZ conferred a weak but increased risk for EOC, but not the reverse (no evidence that EOC caused SZ). SZ was also causally associated with lower levels of two 1- or 2-GPC species and with suggestively lower levels in an additional three 1- or 2-GPCs. We postulate that perturbed choline metabolism in SZ may mimic or contribute to a "cholinic" phenotype, as observed in EOC cells.
Adams, C.D. A brief tour of epidemiologic epigenetics and mental health. Current Opinion Psychology 27, 36-40 (2019). Publisher's VersionAbstract
The epidemiologic study of DNA methylation (DNAm) and mental health is a burgeoning area, but confounding and reverse causation remain important to know about. Whether use of non-brain tissues is appropriate when investigating brain phenotypes depends on the hypothesis and whether the goal is causality or to identify biomarkers. Look-ups of the correspondence between DNAm in blood and brain and use of Mendelian randomization (MR) can be done to follow-up, to some degree, on the causal nature of some findings. Social scientists, health methodologists (epidemiologists), and basic scientists-thinkers who view epigenetics and mental health from different perspectives-can come together in the design and framing of findings to avoid pitfalls and innovate beyond what each could do alone.