Ambiguity Versus Risk in Sequential Decision-Making: Incomplete Information, Causal Inference, and Reinforcement Learning

Blog Series:  PUBLIC IMPACT ANALYTICS SCIENCE (PIAS)

Ambiguity

Sequential decision-making requires one to know, or at least anticipate with confidence, the future consequences of the decisions that will be made today.

The vast majority of the analytical tools in sequential decision-making assume that such consequences can be quantified (e.g., learned from data or experience) via exact probabilities. Real-world decision-making, however, often does not satisfy this assumption. Consider, for example, what happened during COVID-19 in March 2020. Federal and state authorities needed to make decisions on whether or not to close schools and impose lock-down policies followed by potential reopening decisions. They knew that any decision regarding closures and lock-down policies has consequences for the future spread of the virus. But no one could exactly quantify the impact of such decisions, mainly because COVID-19 was, and still is, a new pandemic, not something that we have experienced in the past. 

In particular, we did not have data or past experience to quantify the impact of decisions as exact probabilities. Is the consequence of closing schools and business in the U.S. a 20% reduction in the number of cases? A 65% reduction? Or is it only 1%? What is the impact closing schools and business on the economy? Answering these questions is still hard in 2021, even though we now have much more data and knowledge than back in March 2020.  

In March of 2020, I advised the government of Bahrain on the policies that they should quickly follow (see a Q&A with me here), and also studied the health and economic impact of various lock-down policies across the states in the U.S. (see the study here).  But how should one come up with good decisions, when it is hard to quantify the consequences of decisions in exact probability terms?

You might be thinking that COVID-19 was an exceptional situation. Let me discuss (through some examples) that even in problems for which we have large data sets, we often face the same problem. For example, in medical decision-making a variety of factors might prevent one from specifying exact probabilities as consequences of decisions made. In a joint study with the Mayo Clinic, we helped physicians make better decisions for patients who undergo transplantation and face risks of new-onset diabetes after transplantation (NODAT).  Organ-transplanted patients typically receive high amounts of immunosuppressive drugs (e.g., tacrolimus) as a mechanism to reduce their risk of organ rejection. The diabetogenic effect of these drugs, however, increases the risk of new-onset diabetes after transplantation (NODAT), and hence, becoming insulin dependent. Balancing the risk of organ rejection versus that of NODAT is not an easy task. This is partially because we cannot specify exact probabilities and say that, for a patient with given covariates (e.g., age, medical history, race, body mass index, blood glucose level, etc.), using a high dose of both tacrolimus and insulin will improve the patient’s quality of life (e.g., measured in quality adjusted life years) from X% to Y%, where X and Y are exact numbers.

To begin with, even if you have a large data set, you might see that no patient with similar covariates has received high-dose of both tacrolimus and insulin, so you have no basis to estimate such probabilities. This often happens when you are using a high-dimensional data set where the number of covariates for each patient is high (relative to the number of observations).

Even if in your data you have many similar patients who have received such doses, to estimate exact probabilities you have to assume a lot. For example, you have to specify a causal model and assume that your model is perfectly correct. Your model might have several questionable assumptions (e.g., that there is no unobserved confounder in your data or many other assumptions that you might need for your causal framework to work). But how do you know that your model and the underlying assumptions are correct? Even performing sensitivity analyses often requires extra assumptions and typically cannot be used to say that an exact probability estimate is perfectly correct. So why bother with exact probabilities?

Reinforcement Learning

This highlights that we may need to change our way of thinking. What if, instead of assuming that we have exact probabilities on hand, we acknowledge that we often need to make decisions absent access to them? This brings us to decision-making under ambiguity.

Before I discuss what this entails, what ambiguity means, and how it differs from assuming exact probabilities, let me also clarify the role of recent advancements in Artificial Intelligence (AI) and Machine Learning (ML). You might be asking yourself “don’t advancements in AI/ML allow decision-making without knowing exact probabilities?” The answer is both yes and no.

Specifically, a branch of AI/ML that is very useful in sequential decision-making is Reinforcement Learning (RL). In RL, the decision-making agent does not know the consequences of decisions in exact probabilistic terms. But it can interact with the environment and learn exact probabilistic terms through experience (see Figure 1). This is how, for example, DeepMind researchers in 2016 were able to beat the best human player in the game of Go [1] (see their Nature paper here)—a game that is much more difficult to play than, say, chess. The interaction with the environment allows self-play and learning from scratch. In the context of RL, it was first used by Arthur Samuel’s checkers program, which was aired on TV in 1956.

In doing this in RL, however, you need to define a set of rules, which determine the way the AI agent interacts with the environment.  That is, you need to define an exact environment for the RL agent to interact with. This allows the agent to “explore,” i.e., make new decisions and learn about their consequences through learning-by-doing.

In playing games, such as the game of Go, this is relatively easy. You can simulate the game, because you know all the rules of the game. Similarly, if you are trying to teach a robot how to find its path, you can allow the robot to interact with the environment, make some decisions (e.g., turn right or left) and learn from experience. However, in many other applications, such as the examples of medical decision-making or COVID-19 that I mentioned above, it is impossible to “interact” and “explore” (imagine exploring with imposing different lock-down policies to curb the spread of the COVID-19 pandemic). Not only can you not “interact” and “explore” in such applications, you do not have a simulation environment to allow your decision-making agent to learn-by-doing. Having a simulation environment in hand requires defining the consequences of actions in exact probabilities terms, which takes us back to our original problem. While there have been some advancements being done on these issues, one promising area is to move beyond decision-making under risk and instead understand decision-making under ambiguity. So what is ambiguity and how does it differ from risk?

Ambiguity (a.k.a. Knightian uncertainty) is the lack of knowledge about the true probability model. Risk on the other hand refers to probabilistic consequences of decisions under a known probability model. The distinction between ambiguity and risk is incredibly important. As Charles Manski describes in his book entitled “Identification for Prediction and Decision,” Keynes (1921) and Knight (1921) where among the first to use the term uncertainty to describe decision problems with completely unknown probability distributions. In modern research, however, ambiguity (a.k.a. Knightian uncertainty) is used to refer to situations where there is lack of knowledge about the true probability model.

One of the most well-known economists of our era, Kenneth Arrow, describes in his work [2] (p. 418) that: “There are two types of uncertainty: one as to the hypothesis, which is expressed by saying that the hypothesis is known to belong to a certain class or model, and one as to the future events or observations given the hypothesis, which is expressed by a probability distribution.” Dealing with ambiguity requires us to think based on Arrow’s suggestion: decision makers are often faced with Knightian uncertainty regarding the true model, while under each potential model, they might have access to a probability distribution associated with consequences of each action/decision.

What can we do when we need to make decisions but are faced with ambiguity as opposed to risk? In settings where decisions-making is not dynamic (e.g., is static or repeated) there are good ways to do this (see, e.g., our paper [3] here). But what about decision-making in sequential settings? In 2018, I published a new framework termed “Ambiguous Partially Observable Markov Decision Process” (APOMP), which extends one of the most general dynamic decision-making tools available to us, Partially Observable Markov Decision Processes (POMDPS) [4]. POMDPs, in turn, extend Markov Decision Processes (MDPs) by allowing for “incomplete information,” also known as “partial observability” or “latent variables.”

Most realistic decision-making problems involve dealing with variables that are only partially observable to the decision-maker. In the medical decision-making example I mentioned earlier, blood glucose level plays an important role in what decision should be made by physicians. But this measurement is obtained via tests such as Fasting Plasma Glucose (FPG) and HbA1C. These tests, however, are subject to false-positive and false-negative errors. Thus, for a patient, we might not know the exact value of the variable of interest (blood glucose) at a given point of time, which is often called the “state” variable. This holds even if the patient has recent Fasting Plasma Glucose (FPG) and HbA1C test results. Such test results give us partial knowledge about the blood glucose level, but we cannot assume this variable is fully observable to us.  For instance, if FPG < 126 mg/dL and HbA1C < 6.5% we might think the patient is in a diabetic-free state, but this inference might be wrong because of the false-positive and false-negative errors of these tests. Thus, the best we have in hand is some partial knowledge about the patient’s diabetes condition.

In addition to this type of partial knowledge, which might be caused by missing data or by lack of exact measurements, APOMDPs allow for considering ambiguity as opposed to risk. Traditional MDPs and POMDPs on the other hand were designed to work with risk.  APOMDPs allow us to make decisions under ambiguity as opposed to risk, by considering a “cloud” of probabilities models. Under each model in this cloud, the decision-maker is facing specific risks. In [5], APOMDPs are shown to be extremely useful for causal inference in time-varying settings (e.g., in public policy or medical decision-making where one wants to make use of a longitudinal observational data set to study the impact of counterfactual guidelines and/or find the best one), and it is shown how RL algortims can make this happen despite the underlying ambiguity.

In closing, I should also mention that there have been many other suggestions on how to deal with ambiguity. Lotfi Zadeh, for instance, invented the Fuzzy Theory, which, instead of measuring the consequences of actions/decisions in probabilistic terms, requires one to specify them in fuzzy numbers. Researchers (mainly outside the probability and statistics community) have made a lot of progress on decision-making using Fuzzy Theory. In a paper in 2005 [6],  I discussed how Fuzzy Theory can be used in both multi-criteria and group decision-making problems. I used the problem of hiring a new faculty member in that paper, in which (a) there is a committee of decision-makers (as opposed to one decision-maker) each of which might have a different assessment of potential candidates, and  (b) a variety of criteria to compare the candidates (research and publication, teaching, fit, etc.).  Fuzzy theory allows the consideration of the consequences of decisions in non-probabilities terms. So, in that sense, it avoids assigning exact probabilities to decisions’ consequences. But it does assume that one can assign exact fuzzy numbers. Assigning exact fuzzy numbers, however, is not easy either. Ambiguity, on the other hand, allows for a probabilistic view of the world. However, instead of assuming our probabilistic models are perfect (e.g., there is a single probability distribution that defines the consequences of our actions/decisions), it allows for a cloud of probabilistic models. This gives the decision-maker the ability to optimize actions under each model, and then take actions that work reasonably well, if one of these models turns out to be the correct one.

 

References

[1] Silver, D., Schrittwieser, J., Simonyan, K. et al. Mastering the Game of Go without Human Knowledge. Nature, 2017, 550, 354–359.

[2] Arrow, K.J. Alternative Approaches to the Theory of Choice in Risk-Taking Situations. Econometrica, 1951, 19 (4), 404–437.

[3] Saghafian, S. and B.T. Tomlin. The Newsvendor under Demand Ambiguity: Combining Data with Moment and Tail Information. Operations Research, 2016, 64 (1), 167-185.

[4] Saghafian, S. Ambiguous Partially Observable Markov Decision Processes: Structural Results and Applications, Journal of Economic Theory, 2018, 178, 1-35.

[5] Saghafian, S. Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach. Working Paper, Harvard University.

[6] Saghafian, S. and S.R. Hejazi. Multi-criteria Group Decision Making Using A Modified Fuzzy TOPSIS Procedure.  IEEE Proceedings, Computational Intelligence for Modeling, Control, and Automation, 2005, 2, 215-221.