AI Interactions

Jul 21, 2024

State-of-the-art machine learning models are becoming more able to operate autonomously in the real world. Self-driving cars are interacting on our roads and components of the supply chain are operated in large part by individual systems. As these models become more capable, they will be given greater autonomy and start interacting directly with each other. Moreover, these systems will be given access to a greater array of tools, such as search and code interpreters. Therefore, we ought to ponder game-theoretic questions as to how these systems will interact.

Game theory is a rich theory that has considered the evolution of many different types of scenarios. A famous example includes the prisoner dilemma. These scenarios are largely theoretical and do not often reflect the true nature of humans. Largely because humans are not rational agents, and humans do not operate in the heavily simplified worlds in which these theoretical games are played. Nevertheless, these games can be contextualised to provide useful heuristics on how groups of humans can interact. However, we cannot apply this same translation to interacting AI systems for they have many different properties with add nuance to the dynamics.

AI systems are largely rational. They perform their actions based on the observations and are not influenced by emotion.
AI systems are not as affected by external factors. A human’s mood may change based on external factors which influence the decisions they make. Whereas the internal state of an AI is largely independent of the external world, and thus their decisions are more robust.
AI systems still lack a general sense of common reasoning which we assume humans have. For instance, if there is an esoteric solution to a game, we do not expect this to manifest in human dynamics since it would violate an individual’s common sense. However, we cannot at present ensure that AI systems will not follow these esoteric solutions. This problem falls under the scope of AI alignment, as we can think of our morals as a sense of common reasoning.

These differences mean that on the one hand, the interactions between AI systems are more predictable, as their decisions largely persist over time and are not as influenced by external factors. However, our inability to understand what a model is doing means that it is more challenging to contextualise these theoretical games.

Some other differences affect the style of the interactions between AI systems.

The source code for an AI system is likely to be open-sourced and available to other AI systems. Therefore, one AI system could simulate the other, and get a sense of the likely output of the other system.
The increased amounts of computing resources available to these systems means that their interactions could become more complex. For example, they can become longer and be carried out over shorter time frames.

Since every party involved is aware that these AI systems can simulate, memorise, and hold more complex interactions, the dynamics of these interactions will be vastly different to those we are currently familiar with.

One may argue that these complexities will not emerge since we will develop perfectly rational agents, and thus there would be no need to contextualise these theoretical games. The interactions will just play out in the way that game theory tells us. However, it is not in our interest to develop perfectly rational agents. The Traveller's Dilemma was a game formulated by Kaushik Basu. Individuals Alice and Bob have had a set of their antiques damaged whilst travelling with an airline. They seek compensation from the airline, however, the airline is not capable of valuing these antiques as they are relatively obscure set. Therefore, to ensure that Alice and Bob do not inflate the price of the antiques they get each of them to secretly write down the value of the antiques on pieces of card. The airline will then reimburse the pair with the lowest value written on the cards, penalise the individual who wrote down the higher value and reward the individual who wrote down the lower value. For concreteness, we can suppose that the airline is willing to reimburse any value between £2-£100, and the airline penalises/rewards the individual who gives the higher/lower value by £2. If consider one perspective, say Alices, then we quickly see that the rational choice is to write down £2. Indeed, at first, she thinks she should write down £100, but then assuming that Bob will do the same she is then inclined to write down £99 as then Bob will be penalised £2 and she will be rewarded £2 giving her a total of £101. However, she then realised Bob would consider the same arguments and thus she must further reduce her value to £98. Continuing in this way we end up with Alice reasoning to write down the value of £2.

This scenario shows how a rational agent can descend to unreasonable solutions. Here we considered a low-states scenario, however, it is not difficult to imagine scenarios where the stakes are not so mundane.

Similarly, we would not want a system that tries to please everyone. A system designed to please everyone would be increasingly fractious and wouldn't be able to make any decisions since you cannot please everyone.

These nuances in the construction and implementation of capable AI systems mean we need to rethink how we study their interactions. Optimizing certain characteristics may lead to unintended consequences. AI systems will not operate in the same way as humans. Our traditional notions of how to successfully operate can no longer be applied.

Thomas Walker

Discussion about this post