The Analytics Science Behind ChatGPT: Human, Algorithm, or a Human-Algorithm Centaur?

Blog Series: PUBLIC IMPACT ANALYTICS SCIENCE (PIAS)

Note: a more detailed version is here.

Centaur

Figure: Image of Chiron the Centaur in Daniel Le Clerc, Histoire de la médecine … (Amsterdam, 1723), p. 30.

Developing analytics science methods that can enable combining the power of artificial and human intelligence has brought the concept of centaurs from myth to reality.  In the Greek mythology, centaurs are half-human and half-horse creatures (see the figure above). In the modern analytics science, they refer to systems that allow superior decision-making by combining the power of both humans and trained algorithms. One of the main users in the U.S. has been the Defense Department, which has been working with tech companies to combine the power of algorithms with the capabilities of humans [1]. The concept has attracted the attention of the U.S. military, both in research programs at the Defense Advanced Research Projects Agency and the Pentagon’s third-offset strategy for military advantage [2]. Robert O. Work, for example, who was the deputy secretary of defense under Presidents Trump and Barack Obama, advocated for the idea of centaur weapons systems, which would require human control, instead of pure AI-based systems, and could combine the power of AI with the capabilities of humans [3].

The concept of centaurs is not that new, but it received spotlight attention within the analytics science domain because of its success in applications like paying free-style chess. Specifically, prominent advocates of free-style chess like Gary Kasparov repeatedly argued that human paired with algorithms can do better than just the single strongest computer program in chess [4]. As the chess legend put it:

“Weak human plus machine plus better process was superior to a strong computer alone and, more remarkably, superior to a strong human plus machine plus inferior process.” [5]

Beyond free-style chess, the centaur model is being widely used in a variety of applications of analytics science. In clinical decision-making related to rehabilitation assessment, for example, algorithms provide therapists with detailed analysis on patient’s status, where the collaboration with therapist and such algorithm is shown to improve the practices of rehabilitation assessment [6].

Research in my own lab at Harvard, which we conducted in collaboration with the Mayo Clinic, showed very promising results for a centaur model that we developed to enhance decision-making and reduce readmission risks for a large number of patients who underwent transplantation.  We found that combining human experts’ intuition with the power of a strong machine learning algorithm through a human-algorithm centaur model can outperform both the best algorithm and the human experts [7].

Other examples of using the centaur model to create public impact include systems for spotting anomalies and preventing cyber-attacks, improving design components in manufacturing systems, and assisting officers balance their workloads and helping them to better ensure public safety [2]. And the potentials for centaurs are endless. Thus, it is reasonable to expect most data-driven organizations take advantage of them in the near future. A department of human services, for example, can use algorithms to help predict which child welfare cases are likely to lead to child fatalities and raise a red flag for high-risk cases. Such cases are then reviewed by human experts and the results are shared with frontline staff, who then might choose remedies designed to lower risk and improve outcomes [8].

When and Why Should We Use Human Intuition?

Humans often face difficult decision-making situations, and it seems that their intuition is not always helpful. When facing critical life-changing decisions such as quitting a job or ending a relationship, we tend to be happier with the outcomes later on, when a coin toss tell us to make a change than when it promotes maintaining the status quo [9].

Nonetheless, human intuition is often very powerful, especially when we want to make quick decisions. Put it differently, while intuition often misfires when we are dealing with complex problems that require careful analytics (e.g., in finding ways to reduce incidents of diabetes for organ transplanted patients [10, 11], deciding upon cell formation and layout design for a cellular manufacturing system [12],  or finding most effective ways of saving lives in emergency rooms [13, 14]), it can be very useful when using data, models, and careful analytics is not an option. Malcom Gladwell’s popular book “Blink: The Power of Thinking Without Thinking” provides various examples of this, including when police officers need to quickly decide whether to shoot a suspect [15]. Good intuitive decision-making also helps fire fighters when they face a burning building [16].

In addition, while relying on intuition in handling complex problems can be misleading, combining intuition with the most useful analytics approaches can often be better than just relying on analytics.  To better understand this, it is useful to see how our own system of thinking works. Daniel Kahneman—a contemporary phycologists, known for his groundbreaking work on the psychology of judgment and decision-making as well as behavioral economics, who won the Nobel Prize in Economics in 2002—highlighted in his book “Thinking Fast and Slow” that our brain has two modes of thinking: System 1 and System 2. System 1 is fast and instinctive, but System 2 is slower, more deliberative, and more logical. What is perhaps more interesting is that these systems greatly complement each other. Our body somehow knows that we need both systems to be able make good decisions in different situations. 

Similar to how System 1 and 2 complement each other, intuition and analytics can help each other as well. And this is where human-machine collaborations can play a vital role. We—humans—can use our intuition in many ways while developing analytics methods and taking advantage of computers to run them. For starters, intuition often allows us to develop better models, or considering the George Box’s aphorism “All models are wrong, but some models are useful,” more “useful” ones. Intuition also allows us to verify the results obtained from models and make sure that the assumptions made in the model are not problematic. Analytics scientists often use this simple technique as a feedback loop: when the results obtained from a model are not sensible and can be related to a wrong assumption in the model, they modify their model to obtain a better one.  And if they are working with a cloud of models to address the curse of ambiguity [17, 18], they can replace the models that might be causing the preposterous results. Preposterous results could also be related to abnormalities and/or outliers in data that need to be removed before feeding them to models, and intuition is often very helpful here as well.

Realizing that intuition alone can be misleading in understanding and analyzing complex systems, and that human-machine collaborations are needed to harness the full power of both advanced analytics and mighty intuition, has led to important ways of supplementing expert’s decision-making in the form of decision support systems. An AI-based system, for example, can provide important insights into complex decisions for a decision-maker, a policymaker, or a public leader, who might interact with the AI-based system to analyze a wide range of solutions, before settling on one. Examples of possibilities for these types of collaborative efforts are ample in both the public and private sector.  

ChatGPT: How Does it Work? Is it a Centaur?

ChatGPT benefits from human intuition in a very specific way. Let us first see what ChatGPT is, and then learn how it works by benefiting from powerful human inputs.

ChatGPT (GPT stands for Generative Pre-trained Transformer) is among analytics science models known as Large Language Models (LLMs). These models offer many benefits, including answering questions, churning out astonishingly convincing prose, translating between languages, and even producing codes [19]. Thus, many companies including Google, Meta, and Microsoft as well as various research labs including OpenAI have been working on them. In 2020, for example, OpenAI launched a large language model called GPT-3 that attracted attention:

“Its ability to mimic human-written text with uncanny realism seemed to many like a milestone on the road to true machine intelligence.” [20]

For example, when prompted with a sentence by a human (in italic below), GPT-3 generated paragraphs that read as if they are written by a human:

 “In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. They also were found to have perfectly coiffed hair, and wore what appeared to be Dior makeup.

“We were shocked to discover the unicorns,” said anthropologist Daniel St. Maurice. “They were like nothing we had ever seen before. We had heard legends of the unicorns, but never thought they actually existed.” [20]

GPT-3 had about 175 billion machine learning parameters, and some estimates showed that it consumed about 936 Mwh to train—equivalent of about 30k American households power usage in a day. Some recent improvements have focused making GPT-3 more efficient by reducing these numbers. Some other improvements over GPT-3, such as the work of some researchers at the Google’s Brain team, have also enabled tasks that involve semi-reasoning. Their method termed “chain of thought prompting” enabled language models of sufficient scale (e.g., models with 100 billion parameters) to solve semi-complex reasoning problems that are not solvable with standard prompting methods [21].

In 2022, large language models saw yet another major advancement: ChatGPT—a large language model chatbot developed by OpenAI based on GPT-3.5 with the ability to provide conversational responses that can appear surprisingly human. Like other language models, the main idea behind ChatGPT is simple: to predict the next word in a sentence or phrase based on the context of previous words, using a model trained on a large number of instances. 

Reinforcement Learning with Human Feedback (RLHF) is used in ChatGPT during training to incorporate human feedback so that it can produce responses that are satisfactory to humans. Reinforcement Learning (RL) requires assigning rewards, and one way is to ask a human to assign them. The main ideas behind RL can be chased back to the work of Harvard psychologist Burrhus Frederic Skinner. Skinner published a seminal work in 1938 entitled “The Behavior of Organisms,” in which he perused the idea that animal behavior can, in essence, be described by a simple set of associations between an action and what the animal receives as the subsequent reward or punishment. In the training phase of ChaptGPT, a similar idea is used:  a human “labeler” assigns rewards to various outputs the model generates by ranking them from the best to the worst. Thus, ChaptGPT is trained by incorporating human intuition—specifically in terms of preferences in verbal combinations—and learning from it.

It is important to note, however, that large language models are by no means close to human level of intelligence. They cannot reason or think even in solving simple problems that are intuitive to humans, partially because verbal communication and the ability to think are not the same. Notable scholars such as Noam Chomsky, who studied the mental representations and rules that describe our perceptual and cognitive skills, have argued that we should dig deeper into the organism's genetic endowment and their maturation. More broadly, Chomsky has argued against the focus of modern AI on statistical learning techniques, stating that they are unlikely to yield general principles about the nature of intelligent beings or cognition [22].

Nonetheless, the progress made in large language models including GPT has been substantial. A lyric in Leonard Cohen’s “Anthem” reminds us that “there is a crack in everything, that's how the light gets in.”  In analytics science, it is our duty to see the light that gets in through a model’s crack, be aware of it, and inform others about it.

References

  1. New York Times. Pentagon Turns to Silicon Valley for Edge in Artificial Intelligence. May 2016. 
  2. PARC. Half-Human, Half-Computer? Meet the Modern Centaur. https://www.parc.com/blog/half-human-half-computer-meet-the-modern-centaur/
  3. New York Times. A Case for Cooperation Between Machines and Humans. May 2016. 
  4. Garry Kasparov on AI, Chess, and the Future of Creativity. Mercatus Center blog at Medium/Conversations with Tyler. 10 May 2017.
  5. Kasparov G (2010). The chess master and the computer. The New York Review of Books 57(2):16–19.
  6. Lee, M. H., Siewiorek, D. P., Smailagic, A., Bernardino, A., and Bermúdez i Badia, S. (2021). A Human-AI Collaborative Approach for Clinical Decision Making on Rehabilitation Assessment. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1-14.
  7. Orfanoudaki, A., Saghafian, S., Song, Karen, Cook, C.B. and H.A. Chakkera (2022). Algorithm, Human, or the Centaur: How to Enhance Clinical Care?  Working Paper, Harvard University.
  8. Deloitte (2020). How New Human-Machine Collaborations Could Make Government Organizations More Efficient. Harvard Business Review.
  9. Levitt, S. D. (2021). Heads or tails: The impact of a coin toss on major life decisions and subsequent happiness. The Review of Economic Studies, 88(1), 378-405.
  10. Saghafian, S. (2021). Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach. arXiv preprint arXiv:2112.04571.
  11. Munshi, V. N., Saghafian, S., Cook, C. B., Werner, K. T., Chakkera, H. A. (2020). Comparison of post-transplantation diabetes mellitus incidence and risk factors between kidney and liver transplantation patients. PloS One, 15(1), e0226873.
  12. Saghafian, S., Jokar, M. R. A. (2009). Integrative Cell Formation and Layout Design in Cellular Manufacturing Systems. Journal of Industrial and Systems Engineering, 3(2), 97-115.
  13. Saghafian, S., Kilinc, D., Traub S. J. (2022). Dynamic Assignment of Patients to Primary and Secondary Inpatient Units: Is Patience a Virtue?”, Cambridge Handbook on Productivity, Efficiency and Effectiveness in Healthcare (forthcoming).
  14. Traub, S. J., Bartley, A. C., Smith, V. D., Didehban, R., Lipinski, C. A., Saghafian, S. (2016). Physician in triage versus rotational patient assignment. The Journal of Emergency Medicine, 50(5), 784-790.
  15. Gladwell, M. (2006). Blink: The Power of Thinking Without Thinking. Penguin Books, London.
  16. Klein, G. (1999). Sources of Power: How People Make Decisions. MIT Press, Cambridge, MA.
  17. Saghafian, S. (2018). Ambiguous partially observable Markov decision processes: Structural results and applications. Journal of Economic Theory, 178, 1-35.
  18. Saghafian, S., Tomlin, B. (2016). The newsvendor under demand ambiguity: Combining data with moment and tail information. Operations Research, 64(1), 167-185.
  19. How language-generating AIs could transfer science. Interview by R. Van Noorden, Nature, 605, 5 May 2022, p21.
  20. Heaven, W.D. (2021). Why GPT-3 is the best and worst of AI right now. MIT Technology Review, Feb. 24.
  21. Wei, J. and Zhou, D. (2022). Language models perform reasoning via Chain of Thought. Goggle AI Blog.
  22. Noam Chomsky on Where Artificial Intelligence Went Wrong. The Atlantic, Nov. 1, 2012.