Artificial General Intelligence
Definitions of AGI
Intelligence is an abstract phenomenon. Generally, we envision it as the ability to reason and plan effectively through novel scenarios. Jean Piaget succinctly described intelligence as "what you use when you do not know what to do". Note that these descriptions of intelligence are not anthropocentric. Meaning that apriori it is not clear whether humans are intelligent systems. From our perspectives, it seems as though we are, however, it could be the case that through our infant years, we amass sufficiently many experiences that throughout our later life we can operate in our environment by replaying those experiences with different values to fit the context. Then due to our inability to remember our infancy, we have the perception of encountering novel tasks, whereas unconsciously we are just replaying past experiences. Therefore, knowing whether a system is intelligent is fundamentally linked to identifying novel.
From our definition of intelligence, we observe that it is not a binary feature. It is probably safe to assume that humans exist on the spectrum of intelligence. As our definitions are imprecise it is difficult to test for intelligence. For instance, the IQ test attempts to measure intelligence through a series of comprehension and pattern recognition tasks. However, one can revise to reduce the novelty of the scenarios encountered on the test, making it a flawed test for intelligence. Moreover, it excludes less canonical forms of novelty that account for emotional intelligence.
With our non-anthropocentric notion of intelligence, we can ponder the intelligence of artificial systems, such as machine learning models. More importantly, we can consider artificial systems with a greater intelligence than our own. It is not clear whether developing such a system is possible, and whether it takes the form of current machine learning techniques. It is certainly the case that current state-of-the-art machine learning models exhibit a strong set of abilities that significantly impact society. Whether those abilities constitute a generally intelligent system is a great source of current debate. Regardless as to whether these systems are intelligent according to our definition, it is still important to understand the consequences they will have on our economy, society and world order. To ponder these questions, we will slightly abuse our use of the word intelligence and refer to artificial general intelligence as a system that can automate tasks that humans can competently perform. We proceed with the understanding that the word intelligence is used loosely, current machine learning models are not an artificial general intelligence and it is not clear that current machine learning techniques will lead to an artificial general intelligence.
As an aside, we have made no connection between intelligence and consciousness. Whether consciousness is a necessary condition for intelligence is another source of debate. An interesting approach to test for consciousness suggested in Lex Fridman's podcast with Roman Yampolskiy was to supply models with optical illusions and observe their reactions. Models that demonstrate a reaction to the illusions may have a subjective visual experience. Of course, this does not demonstrate that the system is conscious, but it may be an indication that something peculiar is happening within the internals of the system. To be a robust test we must ensure that the supplied optical illusions elicit unique experiences not present in the training data.
Progress to AGI
A lot of effort is being made to achieve AGI as it would revolutionise industries such as healthcare and education. The potential benefits of these systems are so profound that many argue it would be morally wrong not to develop such systems. However, many are also aware of the potential risks and harms that such a system would have. The current approach to achieving AGI is by scaling up large language models. More specifically, we are training large language models with larger architectures on more data. This is motivated by current scaling indicating a reliable increase in performance. Therefore, assuming we have an idea of the performance required to achieve AGI, and assuming these scaling laws hold, we can arrive at rough predictions as to when AGI will arrive. Indeed, this is precisely the analysis conducted by Leopold Aschenbrenner, who also explores the consequences AGI will have on society.
I do agree that the current progress of this technology warrants attention as there are risks. For instance, due to their scalability and reproducibility, they can contribute immensely to information warfare. Consequently, any entity possessing such technology will have a considerable advantage over its adversaries. Furthermore, it may be difficult for a single entity to fully control the impact of their application of the technology, leading to unintended consequences that can affect vast amounts of people.
At one stage, Leopold uses the token thought rate to demonstrate how capabilities may scale. However, I do not think this metric encapsulates a model's capacity to think. When I am thinking I believe there is a lot more unconscious processing going on than just the tokens I am reciting in my head. Hence, by simply giving models more tokens to predict, I do not think we will get the sorts of increases in capabilities that Leopold predicts. There have been countless instances in recorded history where breakthroughs were made when people were not consciously thinking about a problem. Take Albert Einstein, who made many of his discoveries whilst working at a patent office. Therefore, an LLM’s incapacity to focus more intently on harder problems will require more than just letting it spend more tokens thinking about the particular problem.
Leopold's prediction for the future is based on the current scaling laws. In particular, Leopold predicts that AGI will arise as we scale up the training clusters. However, this is predicated on the assumption that energy and data supply will increase to facilitate such scaling. Although Leopold addresses these issues I believe that the increases he predicts will not be as swift as he imagines. Bureaucratic friction and limited public focus will mean that developing such infrastructure will be a slow process.
Throughout the essays, the capabilities of current systems is evaluated against benchmarks which I think are inherently flawed and not a good metric to determine whether these models are capable of replacing human workers. Most benchmarks get the model to answer questions about a series of topics. Due to the sheer scale of the models, these questions ultimately boil down to memorisation and are not indicative of whether the model can reason. Some of the models perform well on these benchmarks, beating humans with PhDs. From this Leopold immediately makes the conclusion that these models have an intelligence equal to that of individuals with PhDs. However, obtaining a PhD involves a lot more than answering the questions in the benchmarks. A paper from Google DeepMind shows that LLMs struggle with trivial counting and searching problems, despite performing well on standard benchmarks.
Will We Get AGI?
Using the definition that an AGI can automate most human tasks, it will be clear once we achieve AGI. However, using our more abstract definition of intelligence it is unclear when we will achieve AGI, as in practice it is difficult to ensure novelty and test for ability. Many benchmarks try to assess levels of intelligence, however, with the scale of current models, the benchmarks are likely part of the training data. The ARC benchmark is designed to be robust against memorisation and provide novel scenarios to reason through. Currently, the best models achieve around 35% on this benchmark, where it is assumed that humans can achieve around 85%. Indicating that at present these LLMs lack a form of intelligence.
In summary, I think the scaling of LLMs will lead to impactful technologies. In the past scale has provided these systems with some emergent capabilities. For instance, induction heads emerged as transformers were scaled from one to two layers. I am sceptical that scaling systems in their current form will lead to intelligent systems in the broad sense of the word. Moreover, I am sceptical that scaling will happen as dramatically as Leopold suggests. However, these systems will be knowledgeable and contribute to many human tasks. Therefore, we ought to be cautious about their deployment. To achieve full AGI I anticipate that current machine learning architectures will have to be augmented. The rate of