## Who Invented IV Regression?

An Interesting Footnote in the History of IV Regression

As discussed in Stock and Watson, *Introduction to Econometrics* (2003, Ch. 10), the first published use of IV regression to estimate the coefficient on an endogenous variable (that is, to solve the “identification problem” in econometrics) appeared in Appendix B of Philip G. Wright’s book, *The Tariff on Animal and Vegetable Oils*. There, the author showed, via two derivations (one limited information, or single-equation, the other full-information, or system-based), that if there an observed variable that shifts demand but not supply, this variable could be used to estimate the slope of the supply curve. This was applied to data in percentage changes, so the result was the estimation of the elasticity of supply. The estimator, referred to as the “method of external factors,” is in fact the instrumental variables estimator with a single instrument. The second method derived the indirect least squares estimator, based on first solving for the reduced form when there is a variable that shifts supply but not demand and another variable that shifts demand but not supply.

The book is obscure and has long been out of print, so you can view a copy of Appendix B here: (PDF).

There is ambiguity about the authorship of this appendix. Goldberger (1972) and Crow (1994) assign its authorship to Philip Wright’s son, Sewall, already on his way to becoming one of the most important genetic statisticians of the twentieth century. It seems, however, that there is no primary evidence for this – Philip died in 1934, and nobody actually asked Sewall. In their textbook, Stock and Watson are vague on the matter of attribution. To get to the bottom of this, Francesco Trebbi (a graduate student in the Harvard Economics Department) and I did some historical digging and conducted a stylometric analysis – a statistical analysis of the writing style in Appendix B, compared with those of known works by Philip and Sewall Wright.

IV regression had been used prior to Appendix B. Sewall Wright used it in a 1925 analysis of corn and hog cycles that he had undertaken during WWI, but was slow to get in print. In his model, however, all the regressors were exogenous, which he realized and indeed emphasized (in different words, of course), so OLS would have been a satisfactory method of estimation and IV was not needed. It is not clear why Sewall Wright used IV in his 1925 paper, but one possibility is that it was easier computationally than inverting the 4x4 matrices that would come up by OLS (the reason being that he set to zero correlations that were nearly zero, so the IV estimating equations had fewer terms). It is quite possible that IV estimation had been used even earlier as a computational device, or as a method of moments estimator, in circumstances in which it was unnecessary, with all regressors being exogenous.

The importance of Appendix B is that it showed that IV regression could be used to estimate the coefficients on an *endogenous *regressor, which is the reason that IV regression is at the heart of much of modern econometrics. This was a real breakthrough, without which IV regression would have been little more than a footnote in the history of statistical computing.

Our analysis strongly points towards Philip Wright being the author of Appendix B. We also think that there is strong circumstantial and historical evidence that he thought of the idea of IV regression himself, although without additional primary sources we cannot be as sure of this. To find out more, download the paper here: (PDF ).

## Who Was Philip G. Wright?* *

*Philip Green Wright, date unknown*.

Philip Wright seems to have been an interesting person with eclectic interests: take a look at Philip Wright’s CV (which is an updated version of one found in the personnel records of the Brookings Institution).

## Details of the Stylometric Analysis

The statistical methods are described briefly and nontechnically in the paper; for more details, see the Note on Linear Classifiers.

The details of data construction and coding are summarized in a research memo by Francesco Trebbi.

The stylometric analysis was conducted in STATA, using the dataset and .do files contained in a zip file (the .zip file also contains the Note on Linear Classifiers).