Successful Public Impact Analytics Science in the Era of Information and Algorithmic Transparency Movements

Blog Series:  PUBLIC IMPACT ANALYTICS SCIENCE (PIAS)

In this post, I argue for a new minimum transparency criterion for developing successful analytics science methods and algorithms that are aimed at having a positive public impact: making sure that they do not violate the “primum non nocere” principle—a medical term meaning “first, do no harm.” This is an important criteria—one that is often ignored in the analytics science world but is becoming exceedingly vital, especially considering that we live in an era in which information and algorithmic transparency are being recognized as “basic human rights” by various law experts, scholars, and intellectuals.

Let us first understand what transparency means, and what the recent information transparency and algorithmic transparency movements have meant for advancements in what I call “public impact analytics science.”

In a branch of philosophy known as epistemology, transparency has a specific definition. An epistemic state E is said to be “weakly transparent” to a subject S if, and only if, when S is in state E, S can know that S is in state E. Similarly, an epistemic state E is said to be “strongly transparent” to a subject S if, and only if, when S is in state E, S can know that S is in state E, and in addition, when S is not in state E, S can know S is not in state E.

The advancements in analytics science along with the digital revolution of our century have made various states of the society transparent—either weekly or strongly—to many of its actors. The wide spreading of data has encouraged analytics scientists to develop new methods and algorithms of transferring data to useful information—actionable insights that, if implemented, can offer public impact. The availability of these new methods and algorithms, in turn, have encouraged various actors in society not only to facilitate the process of mass data collection, but also to make it available to analytics scientists.  This has enabled such scientists to further contribute to the information transparency movement we have observed during the last couple of decades.

But what has the public impact of this information transparency movement been? Has it resulted in a tangible positive overall impact? Or has it fallen short of its promise?

Let us consider, for example, what has happened in one of the largest sectors of the U.S. economy that has an overall spending roughly equal to one-fifth of the GDP:  the healthcare sector. For years, consumers seeking a healthcare provider had very limited information about their choices and the qualities of the providers they can choose. This lack of information was blamed on peculiarities of the healthcare sector and regarded by policymakers and others as another example of healthcare exceptionalism [1]. In the past two decades, however, we have seen various initiatives on measuring and publicly reporting clinical outcomes, a practice that is often referred to as “public reporting” [1, 2]. Indeed, both government and private organizations have put serious effort into increasing quality transparency in the healthcare sector as a mechanism to empower consumers to make better decisions.

For example, in 2005 the Center for Medicare and Medicaid Services (CMS) launched the website Hospital Compare with provider information about the quality of care at more than 4,000 Medicare-certified hospitals. Other examples include CalHospitalCompare.org, the ProPublica Surgeon Scorecard, and the Compare Hospitals site by the Leapfrog Group—all of which off hospital outcome data—as well as websites by Healthgrades, Consumer Reports, Yelp, and U.S. News & World Report, which offer hospital ratings and rankings.  Similar efforts have been made in many other countries, and the attention to information transparency has significantly increased. In United Kingdom in 2011, for example, Prime Minister David Cameron pledged that the National Health Service would make performance data publicly available and announced that “information is power, and by sharing it, we can deliver modern, personalized, and sustainable public services” [3].

Despite the convincing rationale behind public reporting and similar efforts to increase transparency in the healthcare sector, various studies show a paucity of meaningful impact (see, e.g., [1] and the references therein).  Why have the efforts fallen short of their promise?

Analytics science research has been able to provide important answers. In articles published by Operations Research [1] and the National Academy of Medicine [2], we discuss answers as well as new recommendations for policymakers by using a combination of data-driven analyses, simulations, and counterfactual predictions derived from Game Theory models. Summarizing our findings in [3] and [4], I argue that, unfortunately, policymakers have been using incorrect treatments in curing the ill healthcare system. Incorrect treatments can, and often do, make the illness more severe. Thus, what has happened in the healthcare sector certainly contradicts the “primum non nocere” principle.

But is healthcare the only sector in which the information transparency movement has violated the “primum non nocere” principle? Let us, for example, recall what happened to Facebook data collection efforts in 2016, for which Mark Zuckerberg was widely criticized in 2018. In April of 2018, Mark Zuckerberg apologized to congress for a scandal that involved data and analyses of users’ information by an independent researcher, who then sold it to Cambridge Analytica. Cambridge Analytica was involved in political data analysis, and this data was eventually used to target fake ads (created by Russian intelligence) to users as a mechanism to disrupt the U.S. election in 2016 [5].

These examples remind us that, while the digital revolution has made the use of analytics science techniques that enable transferring large amount of data to actionable information and practical decisions more important than ever, the overall trend has been less-than-desired. In fact, while early best seller books like The Naked Corporation (2003) discussed that transparency would soon alter all aspects of the economy and markets, we are facing paradoxical outcomes such as what the philosopher Shannon Vallor calls “Technological Transparency Paradox” [6]. Summarizing this paradox, the Stanford Encyclopedia of Philosophy  notes that “those in favor of developing technologies to promote radically transparent societies, do so under the premise that this openness will increase accountability and democratic ideals. But the paradox is that this cult of transparency often achieves just the opposite with large unaccountable organizations that are not democratically chosen holding information that can be used to weaken democratic societies. This is due to the asymmetrical relationship between the user and the companies with whom she shares all the data of her life. The user is, indeed, radically open and transparent to the company, but the algorithms used to mine the data and the 3rd parties that this data is shared with is opaque and not subject to accountability. We, the users of these technologies, are forced to be transparent but the companies profiting off our information are not required to be equally transparent.”

Another important aspect of the transparency movement of our century within the analytics science realm is what is known as “algorithmic transparency.” It refers to a basic need: the factors that influence the decisions made by algorithms should be transparent to the people who use, regulate, or are otherwise affected by them. We have seen a tremendous amount of research and effort in the last decade not only in making complex algorithms more transparent, but also in deciding upon appropriate laws that should govern them.  As the privacy law expert Marc Rotenberg argued during a Knowledge Café event organized by UNESCO, “at the core of modern privacy law is a single goal: to make transparent, the automated decisions that impact our lives.” Rotenberg reminded the audience that “at the intersection of law and technology–knowledge of the algorithm is a fundamental right, a human right”.

The omnipresent demand for transparent algorithms has made various researchers and organizations reluctant to develop or implement black-box algorithms that might perform extremely well but are not transparent to the users and stakeholders. Consider, for example, the task of developing an algorithm that can decide upon which patients should be given (and which patients should be denied) access to a newly introduced cancer treatment technology.  In collaboration with Massachusetts General Hospital (MGH), we developed such an algorithm for use in their Proton Therapy center [7]. By using protons rather than x-rays, Proton Therapy offers a superior technology for treating cancer compared to the traditional radiation therapy. Two important advantages compared to the traditional x-ray-based radiation are: (a) more radiation delivered to the malignant tumor, and (b) less radiation delivered to the healthy tissues surrounding the tumor. In addition, Proton Therapy typically causes fewer and less severe side effects such as low blood count, fatigue, and nausea. Due to these advantages, demand for Proton Therapy among cancer patients is currently extremely high. However, since the technology is relatively new, only a few facilities in the U.S. currently offer it (including a center at MGH), and capacity at each of these facilities is fairly limited. Due to a high demand-to-capacity ratio, thus, Proton Therapy centers are often faced with the difficult task of picking a few patients that they can accept for treatment and declining many others. The coverage (or lack of it) by insurers also makes the decision-making process more complex.

In designing an algorithm that can make such decisions at an institution like MGH, one needs to consider a variety of criteria. To begin with, the limited capacity should be ideally allocated to patients who can benefit most from the new treatment. Hence, the algorithm requires performing high-accuracy predictions of whether, and by how much, this new treatment can benefit an individual patient. The algorithm should also predict whether the patient will make a last-minute cancelation if the treatment is offered to her (e.g., when the patient cannot wait longer to receive this advanced treatment, and seeks other treatment options), a common phenom known as a “no show.” Fairness and ethical concerns also play a critical role. Is it ethical to deny treatment to a 98-year old patient just because the algorithm can predict that the patient will die with over 98% chance within two years regardless of whether the treatment is performed or not?  Most notably, however, the algorithm must be interpretable; when a decision is made to deny or accept the patient for this superior treatment, it is the right of the patients, insurers, providers, and other stakeholders to clearly understand why and how such a decision is made.  Thus, in practice, a black-box algorithm that performs well is often not as useful as an algorithm that is interpretable but has a lower performance. Interpretability, indeed, is central, as it allows certifying that the algorithm does not violate the “primum non nocere” principle.

It is important, however, to note that in many applications there are often win-win solutions: there is no need to sacrifice transparency and interpretability of the analytics methods to get the best performance. In applications related to the criminal justice system, for example, it has been shown that black-box algorithms for predicting future arrests may not be more accurate than very simple predictive models that only use age and criminal history (see, e.g., [8]). The wide-spread demand for transparent algorithms has also resulted in new competitions among analytics professionals, with the goal of highlighting the need for algorithms that follow understandable decision-making rules. In 2018, for example, we saw the Explainable Machine Learning Challenge–a prestigious competition organized collaboratively between organizations such as Google, and the Fair Isaac Corporation (FICO), and academics intuitions like Berkeley, Oxford, Imperial, UC Irvine, and MIT.

In advancing the science of analytics and to guarantee that it can achieve its important mission—having a positive public impact—we need to ensure that, first and foremost, we do not violate the “primum non nocere” principle. This is an important step in any science in which the goal is to discover “veritas,” and the public impact analytics science is no exception. This view becomes more important when we note that both the root of the need for transparency and its use originally comes from science not the means or forces of politics or economics.  As the philosopher Byung-Chul Han argues in his book, Transparent Society (2012), “the omnipresent demand for transparency, which has reached the point of fetishism and totalization, goes back to a paradigm shift that cannot be restricted to the realm of politics and economics.”

References

[1] Saghafian, S. and W.J. Hopp. The Role of Quality Transparency in Healthcare: Challenges and Potential Solutions. National Academy of Medicine, 2019.

[2] Saghafian, S. and W.J. Hopp. Can Public Reporting Cure Healthcare? The Role of Quality Transparency in Improving Patient-Provider Alignment. Operations Research, 2020, 68 (1), 71-92.

[3] Saghafian, S. Curing the Healthcare System: Does Transparency Help? 2020, INFORMS OR/MS Today, 47(1).

[4] Saghafian, S. Why Healthcare Transparency Is Complicated and How It Can be Fixed. The Hill, Op-Ed, 2020, Feb. 19.

[5] Au-Yeung A. Why Investors Remain Bullish on Facebook in Day Two Of Zuckerberg’s Congressional Hearnings. 2018, Forbes, April 11.

[6] Vallor, S. Technology and the Virtues: A Philosophical Guide to a Future worth Wanting, Oxford: Oxford University Press.

[7] Saghafian, S., Trichakis, N., Zhu, R., and Shih, H.A. Joint Patient Selection and Scheduling under No-Shows: Theory and Application in Proton Therapy.  2021, Working Paper, Harvard University.

[8] Angelino, Larus-Stone, Alabi, Seltzer, and Rudin, C. Learning Certifiably Optimal Rule Lists for Categorical Data. Journal of Machine Learning Research, 2018, 18, 1-78.