Current State of (Generative) Machine Learning
Analysisng the attention around generative machine learning.
Much of the early research surrounding machine learning mainly considered tasks such as regression and classification. Regression tasks include predicting the value of a variable from a set of inputs. For example, determining the likelihood that a user watches a video based on their watch history is a regression task. Classification on the other hand pertains to discrete variables. For example, identifying the genre of a film based on the features of a film is a classification task.
Generative machine learning is the application of machine learning to construct novel outputs from an input. For example, ChatGPT is a generative machine learning model as it generates a text-based response from user input.
Generative machine learning has gained increasing amounts of public interest since the release of ChatGPT and has been at the centre of public machine learning discourse. However, it is important to make clear that generative machine learning does not represent the field of machine learning in its entirety.
Current Progress
Just recently there have been major advances in the field of generative machine learning.
OpenAI released demonstrations from their SORA model which is a text-to-video model leveraging advances in diffusion model architectures. On the surface, the model seems to demonstrate a remarkable improvement in video generation from previous models. More specifically, the model is more physically coherent and thus generates more realistic videos.
Google released their Gemini 1.5 Pro model which offers improved context-length capabilities over their Gemini 1.0 models. In particular, the Gemini 1.5 Pro model can accurately digest over 700,000 words. Note that 700,000 words are about seven times the number of words in the average novel.
These models show the rapid improvement that is occurring in the field of generative machine learning. Innovations are being made in the training, architecture and implementation of the models. All of this has provided many benefits not just to the field of machine learning, but also to the sciences and society as a whole. Indeed, due to the generality of these models, they can applied in various domains.
However, there is the risk that too much focus is being placed on generative machine learning. Machine learning as a field is more than just these large generative models. Currently, it feels as if we are counting too much on the success of generative AI which, despite all of the attention it is receiving, still has major flaws that limit its capacity to be useful in practice.
Advantages
Due to the vast amounts of data that they are trained on, these generative models elicit seemingly novel responses. Thus, they can be used as a great source of inspiration and idea generation. Even just having an on-demand conversation with a human-like agent can help individuals probe their thoughts. The outputs of these models can provide a foundation from which individuals can work. This shouldn't be surprising as often these models are trained in precisely this manner. They are handed a stream of data and tasked with predicting the subsequent data in the stream.
Furthermore, these models have access to a vast amount of information. Indeed the corpus of data they are trained on includes a wide range of material. Although the models still need to be able to effectively retrieve this information, this breadth of knowledge means that they are applicable in many domains. This is contrary to previous machine learning practices that were specialised to a specific application. On the one hand, the specialisation improves the efficiency of developing the machine learning models, however, too much focus on a specific task may make the model fragile. One could argue that increasing generality in machine learning models may augment a model’s ability to perform a specialised task.
Consequently, these large generative models can effectively act as search engines. Perplexity AI is working on improving the information retrieval mechanisms of these models such that they can efficiently be used as a search engine. Unlike search engines, these models can receive as input larger and more intricate prompts. Making the searching experience more personalised and efficient, as the user can effectively navigate using natural language.
Limitations
Due to the commercial implications of this technology, many organisations training these large models shroud their data sources and architectures in secrecy. This can impact the equitable distribution of the technology, the alignment with the desires of society and the accountability for negative impacts.
For example, the just-released Gemini 1.5 model has been demonstrated to possess strong biases in an attempt to be politically correct and satisfy supposed social standards. This has received pushback from the community as it is clear that Google overshot the current social standards. It is thought that the alignment of the Gemini model was spearheaded by a few who thought they were acting in the best interests of society. It is concerning that despite Google being a large corporation, a model with heavily polarised views was deployed.
Furthermore, it is unclear whether the data being used to train the models has been obtained legitimately. Through reverse engineering, it has been identified in some cases that copyrighted material has been used to train these models. This has raised legal and ethical questions regarding whether human artists should be attributed for their work that is being used to augment the outputs of these generative models.
Current generative machine learning systems have static learned representations. They cannot update their beliefs or understanding in the presence of new information. Once these models have been trained, contradictory representations are difficult to fix. Although methods in mechanistic interpretability can tune certain parameters in small models to remedy wrong representations, it is unclear whether such techniques will be successful in these incredibly large models that may potentially have on the order of a trillion parameters. Moreover, it is unclear how these methods would affect other representations of the model. They may introduce more inconsistencies than they fix. This static representation of the world is not ideal, as it means that the model will always be out-of-date. The world is constantly changing, and so for a generative model to be effective in the real world it must adapt, just as humans update their actions and beliefs based on their experiences.
A lot of the attention surrounding generative models, such as large language models including ChatGPT is centred on statistics that demonstrate its performance is better than humans at some tasks. To arrive at these statistics the developing companies test their models on benchmarks, which are a series of tests designed to identify a particular characteristic of these models. However, many of these benchmarks have flaws. For example, AI Explained has investigated the MMLU benchmark and found that many of the tests within this benchmark are wrong. More specifically, the benchmark is designed to test the knowledge of these models and thus comprises many multiple-choice questions. It was found that many of the questions were ambiguous, gave the wrong solution, or the correct solution wasn’t even an option. Despite these flaws, it should be apparent that testing the knowledge of a model through a series of multiple-choice is not optimal in determining whether the model has an astute knowledge of a subject. Indeed it may work to test the model as an encyclopedia for a subject, but it will not work to determine whether the model could act as an expert for the subject. Furthermore, as information regarding these benchmarks is scattered across the internet, it is entirely plausible that they are contained within the large data pools used to train these generative models. Therefore, using the benchmark to test the trained language model is similar to a student taking a test with the solutions on their desk.
Generative models, such as large language models, are trained to detect statistical patterns in data, and then use these patterns to generate outputs. Consequently, it is difficult to instil a notion of certainty into these models. For example, these models often struggle to deal with simple mathematical expressions. Furthermore, the performance of generative models is going to be inherently biased toward patterns in the data for which statistical correlations are strongest. In other words, if a model is trained on real-world data, it is going to perform best on concepts that are popular in the real world as these concepts are present in more of the data. This is the main reason why models work so well in English, but struggle to perform in less common languages.
For instance, a large language model may know lots of information regarding a movie actor, including the names of their family members. However, when the model is queried on the family members directly they demonstrate very little knowledge. Indeed, the model may not recall that the family member is related to the famous movie actor. From a human perspective, we know with certainty that the movie actor is the son of their mother, however, a large language model lacks this sort of reasoning as its reasoning is derived statistically.
Future Directions of Progress
To try and eliminate some of the limitations outlined above I think it will be essential for these models to leverage multi-modal datasets to develop coherent representations of the world. However, to do so will require the development of bespoke techniques to handle these different formats of information. For instance, convolutional neural networks are optimised for image processing and graph neural networks are optimised for processing graphs. Recently, there have been numerous AI hubs that are intended to work on this very problem.
However, it is also important that in conjunction with this, the quality of data ought to be improved. As the common saying goes, garbage in garbage out. That is, using sub-optimal data will lead to sub-optimal representations in the model. Just as a student struggles to learn from poor educational resources, we cannot expect a model to achieve high-level performance if the data it is supplied with is poor. This also raises the question as to whether a model can achieve higher than human-level performance without synthetic data. Data curated by humans is inherently limited by the abilities of humans. For models to surpass human performance it is essential that they can beyond human data and bootstrap on synthetically generated data.
Another potentially fruitful direction of progress may be to optimise the strategies by which models are made to interact with each other. Just as humans form companies and societies to promote development and innovation. Models should be organised collectively to distribute tasks and share resources. There have been recent advances in the architectural designs of these large generative models that are principled on the idea that a collection of smaller models can effectively collaborate to outperform a single larger model.
Applications
The critical feature of this technology compared to other technologies is its capacity to be distributed. ChatGPT has over 100 million users and is widely available to all demographics, facilitating the spread of information and providing many with sources of intelligence.
Consequently, this technology can evolve and improve on rapid timescales. More specifically, they can be adapted, or fine-tuned, for specific applications such as health care. For example, these models can be adapted to offer general advice and treatment for common illnesses and refer individuals to specialists when required. Helping to alleviate the burden on health services and allow healthcare professionals to focus their attention on those requiring critical care. Moreover, as these systems have a much larger memory capacity than any individual, they can cross reference symptoms across a large database and personalise their services by utilising the medical history, and potentially even genetic data of the patient.
Similarly, these models can be finetuned to act as personal tutors for school children across the globe. Due to the generality of these models, they can operate in most languages and subjects as well as adapt their output to the proficiency of the student. The replicability of these systems means it is possible to individualised support that one teacher in a class of thirty cannot provide. In turn, this increases the accessibility of education as these models can be deployed in an offline fashion and thus access remote corners of the world where the environmental and economic situations are tough.
In the near term, I do not expect these large generative models will replace humans, they are simply not reliable or accurate enough. However, they will augment human roles and improve the efficiency with which humans can carry out their tasks.
Moreover, I do not believe that these systems could be used as an instrument by which to methodically investigate human psychology. They could be used to analyse society as a whole since they are trained to compress human knowledge and thus will inevitably extract societal-level patterns. However, that is not to say that the process by which they generate their outputs is driven by the same mechanisms driving individual human psychology.
Ultimate Goal
Ideally, there will be a time when we each have personalised AI systems that communicate with other personal AI systems. Leading to an ecosystem of cooperative AI systems tailored to individual needs that augment the desires of individuals. An individual should have autonomy over their AI assistants and it should be largely decentralised from any top-down control. Such personalisation of technology has happened before, and I think it is a matter of time before the same happens with AI. For example, telephone boxes shrunk into the pockets of individuals and the printing press facilitated the decentralised sources of knowledge. I think to realise the full potential of AI systems should be implemented at finer scales.
AI Risks
From the current limitations of generative models, I do not think there is a risk of generative models taking over and controlling the world by setting their own goals and making plans to achieve those goals. What’s more, I do not think that such detrimental capabilities will emerge by scaling up current approaches. The greater threat is from humans anthropomorphising the outputs of the models, and being manipulated by their outputs. Or, equivalently using the models to manipulate others.
Having an ecosystem of personalised AI agents that learn in an online fashion will reduce this risk as misaligned traits will be suppressed by the majority, just as the human species has been successful in suppressing the motivations of malevolent individuals.
I am optimistic that the advances in machine learning, as a whole, will be hugely beneficial for society. However, the current limitations of generative models highlight that focus should also be maintained on other machine-learning techniques and we should rest too much weight on the idea that generative models can realise the full potential of AI.