Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditions simultaneously. Moreover, even satisfying all three conditions approximately requires that the data lie in an approximate version of one of the constrained special cases identified by our theorem. These results suggest some of the ways in which key notions of fairness are incompatible with each other, and hence provide a framework for thinking about the trade-offs between them.
We examine how machine learning can be used to improve and understand human decision-making. In particular, we focus on a decision that has important policy consequences. Millions of times each year, judges must decide where defendants will await trial—at home or in jail. By law, this decision hinges on the judge’s prediction of what the defendant would do if released. This is a promising machine learning application because it is a concrete prediction task for which there is a large volume of data available. Yet comparing the algorithm to the judge proves complicated. First, the data are themselves generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the single variable that the algorithm focuses on; for instance, judges may care about racial inequities or about specific crimes (such as violent crimes) rather than just overall crime risk. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: a policy simulation shows crime can be reduced by up to 24.8% with no change in jailing rates, or jail populations can be reduced by 42.0% with no increase in crime rates. Moreover, we see reductions in all categories of crime, including violent ones. Importantly, such gains can be had while also significantly reducing the percentage of African-Americans and Hispanics in jail. We find similar results in a national dataset as well. In addition, by focusing the algorithm on predicting judges’ decisions, rather than defendant behavior, we gain some insight into decision-making: a key problem appears to be that judges to respond to ‘noise’ as if it were signal. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals.
Algorithms are increasingly being used to make recommendations about matters of taste, expanding their scope into domains that are primarily subjective. This raises two important questions. How accurately can algorithms predict subjective preferences, compared to human recommenders? And how much do people trust them? Recommender systems face several disadvantages: They have no preferences of their own and they do not model their recommendations after the way people make recommendations. In a series of experiments, however, we find that recommender systems outperform human recommenders, even in a domain where people have a lot of experience and well-developed tastes: Predicting what people will find funny. Moreover, these recommender systems outperform friends, family members, and significant others. But people do not trust these recommender systems. They do not use them to make recommendations for others, and they prefer to receive recommendations from other people instead. We find that this lack of trust partly stems from the fact that machine recommendations seem harder to understand than human recommendations. But, simple explanations of recommender systems can alleviate this distrust.
This paper develops a model of health insurance that incorporates behavioral biases. In the traditional model, people who are insured overuse low value medical care because of moral hazard. There is ample evidence, though, of a different inefficiency: people underuse high value medical care because they make mistakes. Such “behavioral hazard” changes the fundamental tradeoff between insurance and incentives. With only moral hazard, raising copays increases the efficiency of demand by ameliorating overuse. With the addition of behavioral hazard, raising copays may reduce efficiency by exaggerating underuse. This means that estimating the demand response is no longer enough for setting optimal copays; the health response needs to be considered as well. This provides a theoretical foundation for value-based insurance design: for some high value treatments, for example, copays should be zero (or even negative). Empirically, this reinterpretation of demand proves important, since high value care is often as elastic as low value care. For example, calibration using data from a field experiment suggests that omitting behavioral hazard leads to welfare estimates that can be both wrong in sign and off by an order of magnitude. Optimally designed insurance can thus increase health care efficiency as well as provide financial protection, suggesting the potential for market failure when private insurers are not fully incentivized to counteract behavioral biases.
Mullainathan, S., Mol, C. D., Gautier, E., Giannone, D., Reichlin, L., van Dijk, H., & Wooldridge, J. (2017). Big Data in Economics: Evolution or Revolution?. In L. Matyas, R. Blundell, E. Cantillon, B. Chizzolini, M. Ivaldi, W. Leininger, R. Mariman, et al. (Ed.), Economics without Borders: Economic Research for European Policy Challenges (pp. 612-632) . Cambridge, UK, Cambridge University Press. Publisher's VersionAbstract
The Big Data Era creates a lot of exciting opportunities for new developments in economics and econometrics. At the same time, however, the analysis of large datasets poses difficult methodological problems that should be addressed appropriately and are the subject of the present chapter.
Machine learning tools are beginning to be deployed en masse in health care. While the statistical underpinnings of these techniques have been questioned with regard to causality and stability, we highlight a different concern here, relating to measurement issues. A characteristic feature of health data, unlike other applications of machine learning, is that neither y nor x is measured perfectly. Far from a minor nuance, this can undermine the power of machine learning algorithms to drive change in the health care system--and indeed, can cause them to reproduce and even magnify existing errors in human judgment.
Machines are increasingly doing "intelligent" things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.
There is growing interest in understanding the psychology of the poor—biases that may affect decision-making are of particular interest. The sheer diversity of potential biases—hyperbolic discounting, probabilistic, and judgmental errors just to name a few—poses a key challenge. These psychological biases cannot easily be put into a common unit such as money spent. However, two insights from psychology make this problem more tractable.
First, a large body of work points toward a two-system model of the brain. System 1 thinks fast: it is intuitive, automatic, and effortless, and as a result, prone to biases and errors. System 2 is slow, effortful, deliberate, and costly, but typically produces more unbiased and accurate results. Second, when mentally taxed, people are less likely to engage their System 2 processes. Put simply, one might think of having a (mental) reserve or capacity for the kind of effortful thought required to use System 2. When burdened, there is less of this resource available for use in other judgments and decisions. Though there is no commonly accepted name for this capacity, we will refer to it in this article as “bandwidth” (Mullainathan and Shafir 2013). This two-system model has direct relevance to many of the heuristics and biases familiar to economists. Kahneman and Frederick (2002) and more recently Kahneman (2011) provide reviews. Fudenberg and Levine (2006) develop a model with two systems in the context of time discounting.
Psychologists often study this underlying resource by imposing “cognitive load” to tax bandwidth and measure the impact on judgments and decisions. The many ways to induce load produce similar results on various bandwidth measures and consequences from reduced System 2 thinking. This insight is particularly useful because it implies that bandwidth is both malleable and measurable. It also suggests a unified approach of studying the psychology of poverty. We can understand factors in the lives of the poor, such as malnutrition, alcohol consumption, or sleep deprivation, by how they affect bandwidth. And we can understand important decisions made by the poor, such as technology adoption or savings, through the lens of how they are affected by bandwidth. Clearly, bandwidth is not the only important aspect of the psychological lives of the poor; no single metric can take on this role. However, it provides a way to at least partly understand a great many of the thought processes that drive decision-making by the poor.
An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors.
To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging even for the best players in the world.
We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are significantly more powerful than features based on skill or time.
In this provocative book based on cutting-edge research, Sendhil Mullainathan and Eldar Shafir show that scarcity creates a distinct psychology for everyone struggling to manage with less than they need. Busy people fail to manage their time efficiently for the same reasons the poor and those maxed out on credit cards fail to manage their money. The dynamics of scarcity reveal why dieters find it hard to resist temptation, why students and busy executives mismanage their time, and why the same sugarcane farmers are smarter after harvest than before.
Once we start thinking in terms of scarcity, the problems of modern life come into sharper focus, and Scarcity reveals not only how it leads us astray but also how individuals and organizations can better manage scarcity for greater satisfaction and success.
Imagine sitting in an office located near the railroad tracks. Trains rattle by several times an hour. As you try to concentrate, the rumble of every train pulls you away from what you are doing. You need time to refocus, to collect your thoughts. Worse, just when you have settled back in, another train hurtles by. This description mirrors the conditions of a school in New Haven located next to a noisy railroad line. In the early 1970s two researchers decided to measure the impact of this noise on students. They noted that only one side of the school faced the tracks, so the students in classrooms on that side were particularly exposed to the noise but were otherwise similar to their fellow students.
Successful development programs rely on people to behave and choose in certain ways, and behavioral economics helps us understand why people behave and choose as they do. This paper sketches how to design development programs and policies in ways that are cognizant of and informed by the insights behavioral economics provides into human behavior. It distills the key insights of behavioral economics into a parsimonious framework about the constraints under which people make decisions. It then shows how this framework leads to a set of design principles that can be employed to design programs in areas including health, education, productivity, agriculture, finance, and the delivery of public services. Finally, it offers some reflections on the ways in which these insights and design principles can be incorporated into existing and planned programs to improve their reach and effectiveness.
Targeting assistance to the poor is a central problem in development. We study the problem of designing a proxy means test when the implementing agent is corruptible. Conditioning on more poverty indicators may worsen targeting in this environment because of a novel tradeo between statistical accuracy and enforceability. We then test necessary conditions for this tradeo using data on Below Poverty Line card allocation in India. Less eligible households pay larger bribes and are less likely to obtain cards, but widespread rule violations yield a de facto allocation much less progressive than the de jure one. Enforceability appears to matter
Banerjee, A. V., Hanna, R., & Mullainathan, S. (2013). Corruption. In R. Gibbons & J. Roberts (Ed.), The Handbook of Organizational Economics (pp. 1109-1147) . Princeton, Princeton University Press. Publisher's VersionAbstract
In this paper, we provide a new framework for analyzing corruption in public bureaucracies. The standard way to model corruption is as an example of moral hazard, which then leads to a focus on better monitoring and stricter penalties with the eradication of corruption as the final goal. We propose an alternative approach which emphasizes why corruption arises in the first place. Corruption is modeled as a consequence of the interaction between the underlying task being performed by bureaucrat, the bureaucrat's private incentives and what the principal can observe and control. This allows us to study not just corruption but also other distortions that arise simultaneously with corruption, such as red-tape and ultimately, the quality and efficiency of the public services provided, and how these outcomes vary depending on the specific features of this task. We then review the growing empirical literature on corruption through this perspective and provide guidance for future empirical research.
We provide evidence that individuals optimize imperfectly when making annuity decisions, and this result is not driven by loss aversion. Life annuities are more attractive when presented in a consumption frame than in an investment frame. Highlighting the purchase price in the consumption frame does not alter this result. The level of habitual spending has little interaction with preferences for annuities in the consumption frame. In an investment frame, consumers prefer annuities with principal guarantees; this result is similar for guarantee amounts below, at, and above the purchase price. We discuss implications for the retirement services industry and its regulators.
The poor often behave in less capable ways, which can further perpetuate poverty. We hypothesize that poverty directly impedes cognitive function and present two studies that test this hypothesis. First, we experimentally induced thoughts about finances and found that this reduces cognitive performance among poor but not in well-off participants. Second, we examined the cognitive function of farmers over the planting cycle. We found that the same farmer shows diminished cognitive performance before harvest, when poor, as compared with after harvest, when rich. This cannot be explained by differences in time available, nutrition, or work effort. Nor can it be explained with stress: Although farmers do show more stress before harvest, that does not account for diminished cognitive performance. Instead, it appears that poverty itself reduces cognitive capacity. We suggest that this is because poverty-related concerns consume mental resources, leaving less for other tasks. These data provide a previously unexamined perspective and help explain a spectrum of behaviors among the poor. We discuss some implications for poverty policy.
Research in behavioral public finance has blossomed in recent years, producing diverse empirical and theoretical insights. This article develops a single framework with which to understand these advances. Rather than drawing out the consequences of specific psychological assumptions, the ramework takes a reducedform approach to behavioral modeling. It emphasizes the difference between decision and experienced utility that underlies most behavioral models. We use this framework to examine the behavioral implications for canonical public finance problems involving the provision of social insurance, commodity taxation, and correcting externalities. We show how deeper principles undergird much work in this area and that many insights are not specific to a single psychological assumption.